RE: How to build with XMLCh = wchar_t on Windows platform.

2018-01-24 Thread Mark Douglas
Thanks for the clarification Roger!

Kind regards,
Mark

-Original Message-
From: rle...@codelibre.net [mailto:rle...@codelibre.net] 
Sent: 24 January 2018 11:40
To: c-users@xerces.apache.org
Subject: Re: How to build with XMLCh = wchar_t on Windows platform.

On 2018-01-24 09:30, Mark Douglas wrote:
> As a side note, we have a lot of 32-bit C++ code that needs 
> conversion, but this is going to take time (other 3rd party libraries 
> involved). If you were to drop 32-bit support, this could become an 
> issue for us in the short-term. I'm not sure if there are other Xerces
> C++ users who have legacy 32-bit code that they need to maintain.
> Please don't drop 32-bit support too soon!

Sorry for any confusion, there is no intention to drop the ability for doing 
32-bit builds of Xerces-C++ from source.  This was rather the creation and 
distribution of 32-bit builds at my workplace for my own projects.


Regards,
Roger


Re: How to build with XMLCh = wchar_t on Windows platform.

2018-01-24 Thread rleigh

On 2018-01-24 09:30, Mark Douglas wrote:

As a side note, we have a lot of 32-bit C++ code that needs
conversion, but this is going to take time (other 3rd party libraries
involved). If you were to drop 32-bit support, this could become an
issue for us in the short-term. I'm not sure if there are other Xerces
C++ users who have legacy 32-bit code that they need to maintain.
Please don't drop 32-bit support too soon!


Sorry for any confusion, there is no intention to drop the ability for 
doing 32-bit builds of Xerces-C++ from source.  This was rather the 
creation and distribution of 32-bit builds at my workplace for my own 
projects.



Regards,
Roger


RE: How to build with XMLCh = wchar_t on Windows platform.

2018-01-24 Thread Mark Douglas
Hi Roger,

Thanks again for your feedback. Most useful.

In this particular case, our application is only reading/writing our own 
internal configuration files, so no user input as such. So I should be good not 
having to transcode. All the characters in our files are UTF-8 only.

Thanks for patching the wchar_t change so quickly! I'll give it a try 
(hopefully sometime later this week).

I certainly understand the current naming convention for the library and agree 
that it is currently correct. The problem I have is with our own application. 
I've been tasked with converting our application (actually a suite of desktop 
and web applications) to 64 bit. Our application is a mix of C++, C#, Java and 
COBOL (yes, it's that old) and the latest version of Java (soon to be released 
18.3) is ditching 32-bit support, forcing a move to 64-bit. We have a lot of 
JNI and C++ code that is used in our Java web applications and this code will 
all need to be converted. However, we don't have time to convert the entire 
suite and so we will be left with 32-bit components that will eventually be 
replaced with 64-bit web equivalents over time.

So, for a period of time, our applications will be a mix of 32 and 64-bit 
components. Some of those components will have to exist in both 32 and 64-bit 
versions are they are shared between the new web apps and the older 32-bit 
desktop apps. Some of these shared components use Xerces C++ and I will 
therefore require both 32 and 64-bit builds. Unfortunately, we will be forced 
to have these components *in the same folder*. Again, this is a short-term 
issue which will eventually be solved by moving to a different installer.

So it's all a bit of a mess that will eventually be resolved once we web enable 
everything and are purely 64-bit only.

TL;DR : We need both 32 and 64-bit versions of Xerces C++ to exist in the same 
folder at runtime.

I know this is probably a unique case, so implementing something just for this 
might be too much effort. I can always hand-craft the VS project files to 
generate the names I need, so not a big issue.

As a side note, we have a lot of 32-bit C++ code that needs conversion, but 
this is going to take time (other 3rd party libraries involved). If you were to 
drop 32-bit support, this could become an issue for us in the short-term. I'm 
not sure if there are other Xerces C++ users who have legacy 32-bit code that 
they need to maintain. Please don't drop 32-bit support too soon!

Kind regards,
Mark

-Original Message-
From: Roger Leigh [mailto:rle...@codelibre.net] 
Sent: 23 January 2018 22:46
To: c-users@xerces.apache.org
Subject: Re: How to build with XMLCh = wchar_t on Windows platform.

On 23/01/18 14:59, Mark Douglas wrote:
> Hi Roger,
> 
> I think Microsoft have had wchar_t as a type way before char16_t was 
> introduced (as far back as I can remember, which is getting shorter as I get 
> older :)). At the time, Microsoft were well known for doing things the 
> 'Microsoft way' and not following standards very well. So maybe they backed 
> themselves into a corner to some extent.
> 
> Indeed, your suggested edit has worked! I now have XERCES_XMLCH_T 
> defined as wchar_t :)

That's great to hear.

> I'm not quite sure why I'd need a transcoder if I'm using wchar_t across the 
> board, so hopefully I can get away with having to worry about it. The only 
> XML files we consume are UTF-8, so everything *should* just work. Fingers 
> crossed.

For the most part, it's unlikely to be used unless you have some
non-UTF-8 input.  Not sure if it's used for UTF-8/UTF-16 conversions.

If you would like to give it a try,
https://issues.apache.org/jira/browse/XERCESC-2132 now has a patch attached, or 
you can download directly from the linked github branch. 
Just add "-Dxmlch-type=wchar_t" when running CMake.

> I'm going to push my luck a bit here and ask an unrelated question. I need to 
> build both 32 and 64 bit binaries for our application (it's a long story), 
> but the Xerces C++ CMake system generates .LIBs and .DLLs with the same name. 
> Is it possible to specify that one or other of the builds generates different 
> output filenames? For example, I'd like to generate a xerces-c_3_2x64.dll for 
> my 64-bit build. At the moment I'm having to hand-craft the generated VS 
> project file to achieve this.

It's certainly possible to configure CMake to add architecture-specific 
suffixes.  However, there would be a good bit of breakage resulting from 
that--the DLL naming pattern is used by e.g. autoconf, CMake FindXercesC and 
other scripts to find the library, and this would cease to work. 
And there are likely many other users of the library who depend on the 
stability of the existing names.  As a result, I'd think keeping the existing 
scheme as the d

Re: How to build with XMLCh = wchar_t on Windows platform.

2018-01-23 Thread Roger Leigh

On 23/01/18 14:59, Mark Douglas wrote:

Hi Roger,

I think Microsoft have had wchar_t as a type way before char16_t was introduced 
(as far back as I can remember, which is getting shorter as I get older :)). At 
the time, Microsoft were well known for doing things the 'Microsoft way' and 
not following standards very well. So maybe they backed themselves into a 
corner to some extent.

Indeed, your suggested edit has worked! I now have XERCES_XMLCH_T defined as 
wchar_t :)


That's great to hear.


I'm not quite sure why I'd need a transcoder if I'm using wchar_t across the 
board, so hopefully I can get away with having to worry about it. The only XML 
files we consume are UTF-8, so everything *should* just work. Fingers crossed.


For the most part, it's unlikely to be used unless you have some 
non-UTF-8 input.  Not sure if it's used for UTF-8/UTF-16 conversions.


If you would like to give it a try, 
https://issues.apache.org/jira/browse/XERCESC-2132 now has a patch 
attached, or you can download directly from the linked github branch. 
Just add "-Dxmlch-type=wchar_t" when running CMake.



I'm going to push my luck a bit here and ask an unrelated question. I need to 
build both 32 and 64 bit binaries for our application (it's a long story), but 
the Xerces C++ CMake system generates .LIBs and .DLLs with the same name. Is it 
possible to specify that one or other of the builds generates different output 
filenames? For example, I'd like to generate a xerces-c_3_2x64.dll for my 
64-bit build. At the moment I'm having to hand-craft the generated VS project 
file to achieve this.


It's certainly possible to configure CMake to add architecture-specific 
suffixes.  However, there would be a good bit of breakage resulting from 
that--the DLL naming pattern is used by e.g. autoconf, CMake FindXercesC 
and other scripts to find the library, and this would cease to work. 
And there are likely many other users of the library who depend on the 
stability of the existing names.  As a result, I'd think keeping the 
existing scheme as the default would be necessary.  But, it should 
certainly be possible to add an option to add a suffix to the library 
names for custom builds of the library.  For example 
"-Dlibrary-suffix=my-suffix" or "-Dlibrary-suffix=TRUE" (for automatic 
addition of architecture).  Do you have any established convention for 
the names of the suffixes?


My team's strategy, which matches how things are typically done on Unix 
and with CMake, is to have a separate build directory for 32-bit builds 
and 64-bit builds so that the names never clash even when they are 
identical.  The libraries are unprefixed across the board, and 32-bit 
and 64-bit binaries are never mixed.  Likely we'll be dropping 32-bit 
builds at some point though; not sure anyone actually uses them...



Regards,
Roger


RE: How to build with XMLCh = wchar_t on Windows platform.

2018-01-23 Thread Mark Douglas
Hi Roger,

I think Microsoft have had wchar_t as a type way before char16_t was introduced 
(as far back as I can remember, which is getting shorter as I get older :)). At 
the time, Microsoft were well known for doing things the 'Microsoft way' and 
not following standards very well. So maybe they backed themselves into a 
corner to some extent.

Indeed, your suggested edit has worked! I now have XERCES_XMLCH_T defined as 
wchar_t :)

I'm not quite sure why I'd need a transcoder if I'm using wchar_t across the 
board, so hopefully I can get away with having to worry about it. The only XML 
files we consume are UTF-8, so everything *should* just work. Fingers crossed.

I'm going to push my luck a bit here and ask an unrelated question. I need to 
build both 32 and 64 bit binaries for our application (it's a long story), but 
the Xerces C++ CMake system generates .LIBs and .DLLs with the same name. Is it 
possible to specify that one or other of the builds generates different output 
filenames? For example, I'd like to generate a xerces-c_3_2x64.dll for my 
64-bit build. At the moment I'm having to hand-craft the generated VS project 
file to achieve this.

Many thanks,
Mark

-Original Message-
From: rle...@codelibre.net [mailto:rle...@codelibre.net] 
Sent: 23 January 2018 14:46
To: c-users@xerces.apache.org
Subject: Re: How to build with XMLCh = wchar_t on Windows platform.

On 2018-01-23 14:12, Mark Douglas wrote:
> Hi Roger,
> 
> Thank you very much for this valuable feedback! As I'm new to CMake, I 
> didn't find the options of disabling char16_t (at least I wasn't 
> looking for the right thing to start with!).
> 
> I think the default policy of using char16_t, if it is available, is a 
> good choice - cross platform consistency should be maintained where 
> possible I think. The reason for wanting to use wchar_t is that I'm 
> moving some legacy code from Xerces C++ 1.5.1 to 3.2.0 and a LOT of 
> the application code is using wchar_t as the character type. I've also 
> now selected the VC++ option to 'Treat WChar_t as a Built in Type' in 
> my application meaning that it's no longer compatible with 16-bit 
> integer values.
> 
> If I were to use char16_t for Xerces C++ I'd need to make a lot of
> XMLString::transcode() calls in my application to perform the 
> conversion. Either that, or I'd have to insert a lot of 
> reinterpret_cast etc. throughout the code. So for me, using 
> wchar_t seems like the least invasive of the two options and also 
> means I don't need to perform any transcoding in order to interface 
> with the rest of the application.

That does sound like a pain and is completely understandable.  It's a shame 
that they didn't make wchar_t a typedef for char16_t or vice versa, but I'm 
sure there were reasons for not doing so.  Probably because the wchar_t 
encoding is unspecified, and it would also prevent overloading based on the 
type if they are the same underlying type.

> I think that adding an option to force wchar_t use as you suggest 
> would be a valuable addition - at least on platform where wchar_t is 
> 16-bit.

I have opened https://issues.apache.org/jira/browse/XERCESC-2132 to track this. 
 It should be fairly straightforward to add this for 3.2.1.  
In the interim, hopefully the edit I suggested will achieve the same effect by 
hand with 3.2.0.

> By the way, I wasn't quite sure what the CMake '-Dtranscoder=windows'
> option did.

It's the default transcoding implementation on Windows using functionality 
built into Windows (src/xercesc/util/Transcoders/Win32/Win32TransService.cpp).  
You can see the selection in src/CMakeLists.txt -- search for 
XERCES_USE_TRANSCODER_.  On Windows, you could use ICU as an alternative, or 
GNU iconv if you built it from source.  If you wanted absolutely consistent 
cross-platform behaviour, then ICU might be a good choice (I do this for my 
work projects--we build and use ICU for builds of Xerces-C++ on all platforms). 
 But the default should be fine in practice, so you could just leave it 
unspecified--it should default to "windows".


Regards,
Roger


Re: How to build with XMLCh = wchar_t on Windows platform.

2018-01-23 Thread rleigh

On 2018-01-23 14:12, Mark Douglas wrote:

Hi Roger,

Thank you very much for this valuable feedback! As I'm new to CMake, I
didn't find the options of disabling char16_t (at least I wasn't
looking for the right thing to start with!).

I think the default policy of using char16_t, if it is available, is a
good choice - cross platform consistency should be maintained where
possible I think. The reason for wanting to use wchar_t is that I'm
moving some legacy code from Xerces C++ 1.5.1 to 3.2.0 and a LOT of
the application code is using wchar_t as the character type. I've also
now selected the VC++ option to 'Treat WChar_t as a Built in Type' in
my application meaning that it's no longer compatible with 16-bit
integer values.

If I were to use char16_t for Xerces C++ I'd need to make a lot of
XMLString::transcode() calls in my application to perform the
conversion. Either that, or I'd have to insert a lot of
reinterpret_cast etc. throughout the code. So for me, using
wchar_t seems like the least invasive of the two options and also
means I don't need to perform any transcoding in order to interface
with the rest of the application.


That does sound like a pain and is completely understandable.  It's a 
shame that they didn't make wchar_t a typedef for char16_t or vice 
versa, but I'm sure there were reasons for not doing so.  Probably 
because the wchar_t encoding is unspecified, and it would also prevent 
overloading based on the type if they are the same underlying type.



I think that adding an option to force wchar_t use as you suggest
would be a valuable addition - at least on platform where wchar_t is
16-bit.


I have opened https://issues.apache.org/jira/browse/XERCESC-2132 to 
track this.  It should be fairly straightforward to add this for 3.2.1.  
In the interim, hopefully the edit I suggested will achieve the same 
effect by hand with 3.2.0.



By the way, I wasn't quite sure what the CMake '-Dtranscoder=windows'
option did.


It's the default transcoding implementation on Windows using 
functionality built into Windows 
(src/xercesc/util/Transcoders/Win32/Win32TransService.cpp).  You can see 
the selection in src/CMakeLists.txt -- search for 
XERCES_USE_TRANSCODER_.  On Windows, you could use ICU as an 
alternative, or GNU iconv if you built it from source.  If you wanted 
absolutely consistent cross-platform behaviour, then ICU might be a good 
choice (I do this for my work projects--we build and use ICU for builds 
of Xerces-C++ on all platforms).  But the default should be fine in 
practice, so you could just leave it unspecified--it should default to 
"windows".



Regards,
Roger


RE: How to build with XMLCh = wchar_t on Windows platform.

2018-01-23 Thread Mark Douglas
Hi Roger,

Thank you very much for this valuable feedback! As I'm new to CMake, I didn't 
find the options of disabling char16_t (at least I wasn't looking for the right 
thing to start with!).

I think the default policy of using char16_t, if it is available, is a good 
choice - cross platform consistency should be maintained where possible I 
think. The reason for wanting to use wchar_t is that I'm moving some legacy 
code from Xerces C++ 1.5.1 to 3.2.0 and a LOT of the application code is using 
wchar_t as the character type. I've also now selected the VC++ option to 'Treat 
WChar_t as a Built in Type' in my application meaning that it's no longer 
compatible with 16-bit integer values.

If I were to use char16_t for Xerces C++ I'd need to make a lot of 
XMLString::transcode() calls in my application to perform the conversion. 
Either that, or I'd have to insert a lot of reinterpret_cast etc. 
throughout the code. So for me, using wchar_t seems like the least invasive of 
the two options and also means I don't need to perform any transcoding in order 
to interface with the rest of the application.

I think that adding an option to force wchar_t use as you suggest would be a 
valuable addition - at least on platform where wchar_t is 16-bit.

By the way, I wasn't quite sure what the CMake '-Dtranscoder=windows' option 
did.

Kind regards,
Mark

-Original Message-
From: rle...@codelibre.net [mailto:rle...@codelibre.net] 
Sent: 23 January 2018 13:57
To: c-users@xerces.apache.org
Subject: Re: How to build with XMLCh = wchar_t on Windows platform.

On 2018-01-23 12:03, Mark Douglas wrote:
> Hi,
> 
> This is my first mail to this group, so hopefully I've come to the 
> correct place!

Yes, it is!

> I'm currently attempting to build and use Xerces C++ 3.2.0 in my 
> application, but I'm having an issue where XMLCh is defined as 
> char16_t (not compatible with wchar_t). From what I've read, I'd 
> actually like XMLCh to be defined as wchat_t instead. My code will 
> only ever be run on Windows, so no fear of cross platform 
> incompatibility.
> 
> As I understand it, if XMLCh is defined as wchar_t, I won't see any 
> errors in my own code where the compiler can't convert from XMLCh* to
> wchar_t* etc.
> 
> So my questions are as follows:
> 
> 
> 1.   Is wchar_t something that is still supported by Xerces C++?

Yes.  In cmake/XercesXMLCh.cmake we check for char16_t and wchar_t support.  We 
default to using a 16-bit integer type i.e. uint16_t, but enable char16_t if 
available, and on Windows we fall back to wchar_t if char16_t is unsupported 
(on other platforms wchar_t is 32-bit an unsuitable).  Most non-Windows 
platforms will be using uint16_t, or char16_t if using a new compiler (and 
cmake).

> 2.   What's the correct way of defining XMLCh as wchar_t instead
> of char16_t? I changed the definition of XERCES_XMLCH_T in 
> Xerces_autoconf_config.hpp, but then not all of the tests compiled.
> I'm guessing I need to do something in one of the CMake files?

The above file would be the place to do so.  Just add

   set(HAVE_STD_char16_t FALSE)

after the check.

> 3.   Are there any real reasons why I shouldn't attempt this?

No, it's absolutely still supported.  One thing we might want to do here is to 
add an option to allow the use of wchar_t to be forced, so you could configure 
with e.g. -Dxmlch=wchar_t or -Dxmlch-char16_t=OFF.

An interesting question is what the default policy should be.

- Using wchar is good for compatibility with older Visual Studio versions and 
other Windows APIs
- char16_t is good for cross-platform compatibility since we have the same 
XMLCh type on all modern platforms, and we can also use Unicode string literals 
directly in our sources e.g. u"A UTF-16 string", which makes using Xerces-C++ 
vastly more pleasant.

I'd be interested to know your requirements for needing to use wchar_t.  
Also, for anyone else using char16_t/wchar_t what your needs and preferences 
are.  Portable Xerces-C++ programs should work transparently with either, up to 
the point you need to pass the strings to other library APIs.

Kind regards,
Roger


Re: How to build with XMLCh = wchar_t on Windows platform.

2018-01-23 Thread rleigh

On 2018-01-23 12:03, Mark Douglas wrote:

Hi,

This is my first mail to this group, so hopefully I've come to the
correct place!


Yes, it is!


I'm currently attempting to build and use Xerces C++ 3.2.0 in my
application, but I'm having an issue where XMLCh is defined as
char16_t (not compatible with wchar_t). From what I've read, I'd
actually like XMLCh to be defined as wchat_t instead. My code will
only ever be run on Windows, so no fear of cross platform
incompatibility.

As I understand it, if XMLCh is defined as wchar_t, I won't see any
errors in my own code where the compiler can't convert from XMLCh* to
wchar_t* etc.

So my questions are as follows:


1.   Is wchar_t something that is still supported by Xerces C++?


Yes.  In cmake/XercesXMLCh.cmake we check for char16_t and wchar_t 
support.  We default to using a 16-bit integer type i.e. uint16_t, but 
enable char16_t if available, and on Windows we fall back to wchar_t if 
char16_t is unsupported (on other platforms wchar_t is 32-bit an 
unsuitable).  Most non-Windows platforms will be using uint16_t, or 
char16_t if using a new compiler (and cmake).



2.   What's the correct way of defining XMLCh as wchar_t instead
of char16_t? I changed the definition of XERCES_XMLCH_T in
Xerces_autoconf_config.hpp, but then not all of the tests compiled.
I'm guessing I need to do something in one of the CMake files?


The above file would be the place to do so.  Just add

  set(HAVE_STD_char16_t FALSE)

after the check.


3.   Are there any real reasons why I shouldn't attempt this?


No, it's absolutely still supported.  One thing we might want to do here 
is to add an option to allow the use of wchar_t to be forced, so you 
could configure with e.g. -Dxmlch=wchar_t or -Dxmlch-char16_t=OFF.


An interesting question is what the default policy should be.

- Using wchar is good for compatibility with older Visual Studio 
versions and other Windows APIs
- char16_t is good for cross-platform compatibility since we have the 
same XMLCh type on all modern platforms, and we can also use Unicode 
string literals directly in our sources e.g. u"A UTF-16 string", which 
makes using Xerces-C++ vastly more pleasant.


I'd be interested to know your requirements for needing to use wchar_t.  
Also, for anyone else using char16_t/wchar_t what your needs and 
preferences are.  Portable Xerces-C++ programs should work transparently 
with either, up to the point you need to pass the strings to other 
library APIs.


Kind regards,
Roger