RE: How to build with XMLCh = wchar_t on Windows platform.
Thanks for the clarification Roger! Kind regards, Mark -Original Message- From: rle...@codelibre.net [mailto:rle...@codelibre.net] Sent: 24 January 2018 11:40 To: c-users@xerces.apache.org Subject: Re: How to build with XMLCh = wchar_t on Windows platform. On 2018-01-24 09:30, Mark Douglas wrote: > As a side note, we have a lot of 32-bit C++ code that needs > conversion, but this is going to take time (other 3rd party libraries > involved). If you were to drop 32-bit support, this could become an > issue for us in the short-term. I'm not sure if there are other Xerces > C++ users who have legacy 32-bit code that they need to maintain. > Please don't drop 32-bit support too soon! Sorry for any confusion, there is no intention to drop the ability for doing 32-bit builds of Xerces-C++ from source. This was rather the creation and distribution of 32-bit builds at my workplace for my own projects. Regards, Roger
Re: How to build with XMLCh = wchar_t on Windows platform.
On 2018-01-24 09:30, Mark Douglas wrote: As a side note, we have a lot of 32-bit C++ code that needs conversion, but this is going to take time (other 3rd party libraries involved). If you were to drop 32-bit support, this could become an issue for us in the short-term. I'm not sure if there are other Xerces C++ users who have legacy 32-bit code that they need to maintain. Please don't drop 32-bit support too soon! Sorry for any confusion, there is no intention to drop the ability for doing 32-bit builds of Xerces-C++ from source. This was rather the creation and distribution of 32-bit builds at my workplace for my own projects. Regards, Roger
RE: How to build with XMLCh = wchar_t on Windows platform.
Hi Roger, Thanks again for your feedback. Most useful. In this particular case, our application is only reading/writing our own internal configuration files, so no user input as such. So I should be good not having to transcode. All the characters in our files are UTF-8 only. Thanks for patching the wchar_t change so quickly! I'll give it a try (hopefully sometime later this week). I certainly understand the current naming convention for the library and agree that it is currently correct. The problem I have is with our own application. I've been tasked with converting our application (actually a suite of desktop and web applications) to 64 bit. Our application is a mix of C++, C#, Java and COBOL (yes, it's that old) and the latest version of Java (soon to be released 18.3) is ditching 32-bit support, forcing a move to 64-bit. We have a lot of JNI and C++ code that is used in our Java web applications and this code will all need to be converted. However, we don't have time to convert the entire suite and so we will be left with 32-bit components that will eventually be replaced with 64-bit web equivalents over time. So, for a period of time, our applications will be a mix of 32 and 64-bit components. Some of those components will have to exist in both 32 and 64-bit versions are they are shared between the new web apps and the older 32-bit desktop apps. Some of these shared components use Xerces C++ and I will therefore require both 32 and 64-bit builds. Unfortunately, we will be forced to have these components *in the same folder*. Again, this is a short-term issue which will eventually be solved by moving to a different installer. So it's all a bit of a mess that will eventually be resolved once we web enable everything and are purely 64-bit only. TL;DR : We need both 32 and 64-bit versions of Xerces C++ to exist in the same folder at runtime. I know this is probably a unique case, so implementing something just for this might be too much effort. I can always hand-craft the VS project files to generate the names I need, so not a big issue. As a side note, we have a lot of 32-bit C++ code that needs conversion, but this is going to take time (other 3rd party libraries involved). If you were to drop 32-bit support, this could become an issue for us in the short-term. I'm not sure if there are other Xerces C++ users who have legacy 32-bit code that they need to maintain. Please don't drop 32-bit support too soon! Kind regards, Mark -Original Message- From: Roger Leigh [mailto:rle...@codelibre.net] Sent: 23 January 2018 22:46 To: c-users@xerces.apache.org Subject: Re: How to build with XMLCh = wchar_t on Windows platform. On 23/01/18 14:59, Mark Douglas wrote: > Hi Roger, > > I think Microsoft have had wchar_t as a type way before char16_t was > introduced (as far back as I can remember, which is getting shorter as I get > older :)). At the time, Microsoft were well known for doing things the > 'Microsoft way' and not following standards very well. So maybe they backed > themselves into a corner to some extent. > > Indeed, your suggested edit has worked! I now have XERCES_XMLCH_T > defined as wchar_t :) That's great to hear. > I'm not quite sure why I'd need a transcoder if I'm using wchar_t across the > board, so hopefully I can get away with having to worry about it. The only > XML files we consume are UTF-8, so everything *should* just work. Fingers > crossed. For the most part, it's unlikely to be used unless you have some non-UTF-8 input. Not sure if it's used for UTF-8/UTF-16 conversions. If you would like to give it a try, https://issues.apache.org/jira/browse/XERCESC-2132 now has a patch attached, or you can download directly from the linked github branch. Just add "-Dxmlch-type=wchar_t" when running CMake. > I'm going to push my luck a bit here and ask an unrelated question. I need to > build both 32 and 64 bit binaries for our application (it's a long story), > but the Xerces C++ CMake system generates .LIBs and .DLLs with the same name. > Is it possible to specify that one or other of the builds generates different > output filenames? For example, I'd like to generate a xerces-c_3_2x64.dll for > my 64-bit build. At the moment I'm having to hand-craft the generated VS > project file to achieve this. It's certainly possible to configure CMake to add architecture-specific suffixes. However, there would be a good bit of breakage resulting from that--the DLL naming pattern is used by e.g. autoconf, CMake FindXercesC and other scripts to find the library, and this would cease to work. And there are likely many other users of the library who depend on the stability of the existing names. As a result, I'd think keeping the existing scheme as the d
Re: How to build with XMLCh = wchar_t on Windows platform.
On 23/01/18 14:59, Mark Douglas wrote: Hi Roger, I think Microsoft have had wchar_t as a type way before char16_t was introduced (as far back as I can remember, which is getting shorter as I get older :)). At the time, Microsoft were well known for doing things the 'Microsoft way' and not following standards very well. So maybe they backed themselves into a corner to some extent. Indeed, your suggested edit has worked! I now have XERCES_XMLCH_T defined as wchar_t :) That's great to hear. I'm not quite sure why I'd need a transcoder if I'm using wchar_t across the board, so hopefully I can get away with having to worry about it. The only XML files we consume are UTF-8, so everything *should* just work. Fingers crossed. For the most part, it's unlikely to be used unless you have some non-UTF-8 input. Not sure if it's used for UTF-8/UTF-16 conversions. If you would like to give it a try, https://issues.apache.org/jira/browse/XERCESC-2132 now has a patch attached, or you can download directly from the linked github branch. Just add "-Dxmlch-type=wchar_t" when running CMake. I'm going to push my luck a bit here and ask an unrelated question. I need to build both 32 and 64 bit binaries for our application (it's a long story), but the Xerces C++ CMake system generates .LIBs and .DLLs with the same name. Is it possible to specify that one or other of the builds generates different output filenames? For example, I'd like to generate a xerces-c_3_2x64.dll for my 64-bit build. At the moment I'm having to hand-craft the generated VS project file to achieve this. It's certainly possible to configure CMake to add architecture-specific suffixes. However, there would be a good bit of breakage resulting from that--the DLL naming pattern is used by e.g. autoconf, CMake FindXercesC and other scripts to find the library, and this would cease to work. And there are likely many other users of the library who depend on the stability of the existing names. As a result, I'd think keeping the existing scheme as the default would be necessary. But, it should certainly be possible to add an option to add a suffix to the library names for custom builds of the library. For example "-Dlibrary-suffix=my-suffix" or "-Dlibrary-suffix=TRUE" (for automatic addition of architecture). Do you have any established convention for the names of the suffixes? My team's strategy, which matches how things are typically done on Unix and with CMake, is to have a separate build directory for 32-bit builds and 64-bit builds so that the names never clash even when they are identical. The libraries are unprefixed across the board, and 32-bit and 64-bit binaries are never mixed. Likely we'll be dropping 32-bit builds at some point though; not sure anyone actually uses them... Regards, Roger
RE: How to build with XMLCh = wchar_t on Windows platform.
Hi Roger, I think Microsoft have had wchar_t as a type way before char16_t was introduced (as far back as I can remember, which is getting shorter as I get older :)). At the time, Microsoft were well known for doing things the 'Microsoft way' and not following standards very well. So maybe they backed themselves into a corner to some extent. Indeed, your suggested edit has worked! I now have XERCES_XMLCH_T defined as wchar_t :) I'm not quite sure why I'd need a transcoder if I'm using wchar_t across the board, so hopefully I can get away with having to worry about it. The only XML files we consume are UTF-8, so everything *should* just work. Fingers crossed. I'm going to push my luck a bit here and ask an unrelated question. I need to build both 32 and 64 bit binaries for our application (it's a long story), but the Xerces C++ CMake system generates .LIBs and .DLLs with the same name. Is it possible to specify that one or other of the builds generates different output filenames? For example, I'd like to generate a xerces-c_3_2x64.dll for my 64-bit build. At the moment I'm having to hand-craft the generated VS project file to achieve this. Many thanks, Mark -Original Message- From: rle...@codelibre.net [mailto:rle...@codelibre.net] Sent: 23 January 2018 14:46 To: c-users@xerces.apache.org Subject: Re: How to build with XMLCh = wchar_t on Windows platform. On 2018-01-23 14:12, Mark Douglas wrote: > Hi Roger, > > Thank you very much for this valuable feedback! As I'm new to CMake, I > didn't find the options of disabling char16_t (at least I wasn't > looking for the right thing to start with!). > > I think the default policy of using char16_t, if it is available, is a > good choice - cross platform consistency should be maintained where > possible I think. The reason for wanting to use wchar_t is that I'm > moving some legacy code from Xerces C++ 1.5.1 to 3.2.0 and a LOT of > the application code is using wchar_t as the character type. I've also > now selected the VC++ option to 'Treat WChar_t as a Built in Type' in > my application meaning that it's no longer compatible with 16-bit > integer values. > > If I were to use char16_t for Xerces C++ I'd need to make a lot of > XMLString::transcode() calls in my application to perform the > conversion. Either that, or I'd have to insert a lot of > reinterpret_cast etc. throughout the code. So for me, using > wchar_t seems like the least invasive of the two options and also > means I don't need to perform any transcoding in order to interface > with the rest of the application. That does sound like a pain and is completely understandable. It's a shame that they didn't make wchar_t a typedef for char16_t or vice versa, but I'm sure there were reasons for not doing so. Probably because the wchar_t encoding is unspecified, and it would also prevent overloading based on the type if they are the same underlying type. > I think that adding an option to force wchar_t use as you suggest > would be a valuable addition - at least on platform where wchar_t is > 16-bit. I have opened https://issues.apache.org/jira/browse/XERCESC-2132 to track this. It should be fairly straightforward to add this for 3.2.1. In the interim, hopefully the edit I suggested will achieve the same effect by hand with 3.2.0. > By the way, I wasn't quite sure what the CMake '-Dtranscoder=windows' > option did. It's the default transcoding implementation on Windows using functionality built into Windows (src/xercesc/util/Transcoders/Win32/Win32TransService.cpp). You can see the selection in src/CMakeLists.txt -- search for XERCES_USE_TRANSCODER_. On Windows, you could use ICU as an alternative, or GNU iconv if you built it from source. If you wanted absolutely consistent cross-platform behaviour, then ICU might be a good choice (I do this for my work projects--we build and use ICU for builds of Xerces-C++ on all platforms). But the default should be fine in practice, so you could just leave it unspecified--it should default to "windows". Regards, Roger
Re: How to build with XMLCh = wchar_t on Windows platform.
On 2018-01-23 14:12, Mark Douglas wrote: Hi Roger, Thank you very much for this valuable feedback! As I'm new to CMake, I didn't find the options of disabling char16_t (at least I wasn't looking for the right thing to start with!). I think the default policy of using char16_t, if it is available, is a good choice - cross platform consistency should be maintained where possible I think. The reason for wanting to use wchar_t is that I'm moving some legacy code from Xerces C++ 1.5.1 to 3.2.0 and a LOT of the application code is using wchar_t as the character type. I've also now selected the VC++ option to 'Treat WChar_t as a Built in Type' in my application meaning that it's no longer compatible with 16-bit integer values. If I were to use char16_t for Xerces C++ I'd need to make a lot of XMLString::transcode() calls in my application to perform the conversion. Either that, or I'd have to insert a lot of reinterpret_cast etc. throughout the code. So for me, using wchar_t seems like the least invasive of the two options and also means I don't need to perform any transcoding in order to interface with the rest of the application. That does sound like a pain and is completely understandable. It's a shame that they didn't make wchar_t a typedef for char16_t or vice versa, but I'm sure there were reasons for not doing so. Probably because the wchar_t encoding is unspecified, and it would also prevent overloading based on the type if they are the same underlying type. I think that adding an option to force wchar_t use as you suggest would be a valuable addition - at least on platform where wchar_t is 16-bit. I have opened https://issues.apache.org/jira/browse/XERCESC-2132 to track this. It should be fairly straightforward to add this for 3.2.1. In the interim, hopefully the edit I suggested will achieve the same effect by hand with 3.2.0. By the way, I wasn't quite sure what the CMake '-Dtranscoder=windows' option did. It's the default transcoding implementation on Windows using functionality built into Windows (src/xercesc/util/Transcoders/Win32/Win32TransService.cpp). You can see the selection in src/CMakeLists.txt -- search for XERCES_USE_TRANSCODER_. On Windows, you could use ICU as an alternative, or GNU iconv if you built it from source. If you wanted absolutely consistent cross-platform behaviour, then ICU might be a good choice (I do this for my work projects--we build and use ICU for builds of Xerces-C++ on all platforms). But the default should be fine in practice, so you could just leave it unspecified--it should default to "windows". Regards, Roger
RE: How to build with XMLCh = wchar_t on Windows platform.
Hi Roger, Thank you very much for this valuable feedback! As I'm new to CMake, I didn't find the options of disabling char16_t (at least I wasn't looking for the right thing to start with!). I think the default policy of using char16_t, if it is available, is a good choice - cross platform consistency should be maintained where possible I think. The reason for wanting to use wchar_t is that I'm moving some legacy code from Xerces C++ 1.5.1 to 3.2.0 and a LOT of the application code is using wchar_t as the character type. I've also now selected the VC++ option to 'Treat WChar_t as a Built in Type' in my application meaning that it's no longer compatible with 16-bit integer values. If I were to use char16_t for Xerces C++ I'd need to make a lot of XMLString::transcode() calls in my application to perform the conversion. Either that, or I'd have to insert a lot of reinterpret_cast etc. throughout the code. So for me, using wchar_t seems like the least invasive of the two options and also means I don't need to perform any transcoding in order to interface with the rest of the application. I think that adding an option to force wchar_t use as you suggest would be a valuable addition - at least on platform where wchar_t is 16-bit. By the way, I wasn't quite sure what the CMake '-Dtranscoder=windows' option did. Kind regards, Mark -Original Message- From: rle...@codelibre.net [mailto:rle...@codelibre.net] Sent: 23 January 2018 13:57 To: c-users@xerces.apache.org Subject: Re: How to build with XMLCh = wchar_t on Windows platform. On 2018-01-23 12:03, Mark Douglas wrote: > Hi, > > This is my first mail to this group, so hopefully I've come to the > correct place! Yes, it is! > I'm currently attempting to build and use Xerces C++ 3.2.0 in my > application, but I'm having an issue where XMLCh is defined as > char16_t (not compatible with wchar_t). From what I've read, I'd > actually like XMLCh to be defined as wchat_t instead. My code will > only ever be run on Windows, so no fear of cross platform > incompatibility. > > As I understand it, if XMLCh is defined as wchar_t, I won't see any > errors in my own code where the compiler can't convert from XMLCh* to > wchar_t* etc. > > So my questions are as follows: > > > 1. Is wchar_t something that is still supported by Xerces C++? Yes. In cmake/XercesXMLCh.cmake we check for char16_t and wchar_t support. We default to using a 16-bit integer type i.e. uint16_t, but enable char16_t if available, and on Windows we fall back to wchar_t if char16_t is unsupported (on other platforms wchar_t is 32-bit an unsuitable). Most non-Windows platforms will be using uint16_t, or char16_t if using a new compiler (and cmake). > 2. What's the correct way of defining XMLCh as wchar_t instead > of char16_t? I changed the definition of XERCES_XMLCH_T in > Xerces_autoconf_config.hpp, but then not all of the tests compiled. > I'm guessing I need to do something in one of the CMake files? The above file would be the place to do so. Just add set(HAVE_STD_char16_t FALSE) after the check. > 3. Are there any real reasons why I shouldn't attempt this? No, it's absolutely still supported. One thing we might want to do here is to add an option to allow the use of wchar_t to be forced, so you could configure with e.g. -Dxmlch=wchar_t or -Dxmlch-char16_t=OFF. An interesting question is what the default policy should be. - Using wchar is good for compatibility with older Visual Studio versions and other Windows APIs - char16_t is good for cross-platform compatibility since we have the same XMLCh type on all modern platforms, and we can also use Unicode string literals directly in our sources e.g. u"A UTF-16 string", which makes using Xerces-C++ vastly more pleasant. I'd be interested to know your requirements for needing to use wchar_t. Also, for anyone else using char16_t/wchar_t what your needs and preferences are. Portable Xerces-C++ programs should work transparently with either, up to the point you need to pass the strings to other library APIs. Kind regards, Roger
Re: How to build with XMLCh = wchar_t on Windows platform.
On 2018-01-23 12:03, Mark Douglas wrote: Hi, This is my first mail to this group, so hopefully I've come to the correct place! Yes, it is! I'm currently attempting to build and use Xerces C++ 3.2.0 in my application, but I'm having an issue where XMLCh is defined as char16_t (not compatible with wchar_t). From what I've read, I'd actually like XMLCh to be defined as wchat_t instead. My code will only ever be run on Windows, so no fear of cross platform incompatibility. As I understand it, if XMLCh is defined as wchar_t, I won't see any errors in my own code where the compiler can't convert from XMLCh* to wchar_t* etc. So my questions are as follows: 1. Is wchar_t something that is still supported by Xerces C++? Yes. In cmake/XercesXMLCh.cmake we check for char16_t and wchar_t support. We default to using a 16-bit integer type i.e. uint16_t, but enable char16_t if available, and on Windows we fall back to wchar_t if char16_t is unsupported (on other platforms wchar_t is 32-bit an unsuitable). Most non-Windows platforms will be using uint16_t, or char16_t if using a new compiler (and cmake). 2. What's the correct way of defining XMLCh as wchar_t instead of char16_t? I changed the definition of XERCES_XMLCH_T in Xerces_autoconf_config.hpp, but then not all of the tests compiled. I'm guessing I need to do something in one of the CMake files? The above file would be the place to do so. Just add set(HAVE_STD_char16_t FALSE) after the check. 3. Are there any real reasons why I shouldn't attempt this? No, it's absolutely still supported. One thing we might want to do here is to add an option to allow the use of wchar_t to be forced, so you could configure with e.g. -Dxmlch=wchar_t or -Dxmlch-char16_t=OFF. An interesting question is what the default policy should be. - Using wchar is good for compatibility with older Visual Studio versions and other Windows APIs - char16_t is good for cross-platform compatibility since we have the same XMLCh type on all modern platforms, and we can also use Unicode string literals directly in our sources e.g. u"A UTF-16 string", which makes using Xerces-C++ vastly more pleasant. I'd be interested to know your requirements for needing to use wchar_t. Also, for anyone else using char16_t/wchar_t what your needs and preferences are. Portable Xerces-C++ programs should work transparently with either, up to the point you need to pass the strings to other library APIs. Kind regards, Roger