Re: Integrating CMake support for xerces

2017-04-25 Thread Cantor, Scott
On 4/25/17, 8:30 PM, "Cantor, Scott"  wrote:

> So far there is very little divergence, just a few small API additions that 
> are unique to the trunk. So I don't foresee anything
> terribly risky about releasing this after some additional fixes, some 
> testing, and incorporating your patch.

Other than some things I have to port up from the branch and other bug reports 
that have come in, the two big commits on trunk are:

r1517488 (XERCESC-2016)
r1528170 (XERCESC-2019)

The former is a patch that's pretty invasive to add XML 1.0 5th edition 
support, which I surmise actually removes a lot of the special handling of XML 
1.1 All of that is outside my expertise, so I don't have any insight into how 
risky that change is or how well it was tested. For myself I don't need it at 
all and would as soon undo it if it can't be verified as safe, but I'm not 
suggesting that exactly, just noting it's significant.

The latter is smaller and is a change to memory handling of text buffers in the 
DOM. I haven't fully grokked that yet but I doubt it's a big deal, just worth a 
look.

Everything else on trunk now that's not on the branch is much simpler and I 
don't see as risky.

-- Scott



-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2052) TranscodeToStr constructor throws TranscodingException claiming an invalid multi byte sequence when it is valid

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-2052:
--
Affects Version/s: 3.1.0
   3.1.1

> TranscodeToStr constructor throws TranscodingException claiming an invalid 
> multi byte sequence when it is valid
> ---
>
> Key: XERCESC-2052
> URL: https://issues.apache.org/jira/browse/XERCESC-2052
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: Windows 32 and 64 bit compiled with VS2010
>Reporter: Nigel Meachen
>Assignee: Alberto Massari
> Fix For: 3.2.0
>
>
> The following constructor throws an EncodingException
> TranscodeToStr tTransCoder (L"中国制造 / 中國製造","UTF-8", 
> XMLPlatformUtils::fgMemoryManager);
> The code in TranscodeToStr::transcode allocates 26 bytes when 27 are needed, 
> however, it does not reach the reallocation logic as charsRead is returned by 
> trans->transcodeTo as zero. This only occurs in a Release build.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Resolved] (XERCESC-2052) TranscodeToStr constructor throws TranscodingException claiming an invalid multi byte sequence when it is valid

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor resolved XERCESC-2052.
---
Resolution: Fixed

Re-closing since it appears we're closing the 3.1 branch.

> TranscodeToStr constructor throws TranscodingException claiming an invalid 
> multi byte sequence when it is valid
> ---
>
> Key: XERCESC-2052
> URL: https://issues.apache.org/jira/browse/XERCESC-2052
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: Windows 32 and 64 bit compiled with VS2010
>Reporter: Nigel Meachen
>Assignee: Alberto Massari
> Fix For: 3.2.0
>
>
> The following constructor throws an EncodingException
> TranscodeToStr tTransCoder (L"中国制造 / 中國製造","UTF-8", 
> XMLPlatformUtils::fgMemoryManager);
> The code in TranscodeToStr::transcode allocates 26 bytes when 27 are needed, 
> however, it does not reach the reallocation logic as charsRead is returned by 
> trans->transcodeTo as zero. This only occurs in a Release build.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2052) TranscodeToStr constructor throws TranscodingException claiming an invalid multi byte sequence when it is valid

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-2052:
--
Fix Version/s: (was: 3.1.5)

> TranscodeToStr constructor throws TranscodingException claiming an invalid 
> multi byte sequence when it is valid
> ---
>
> Key: XERCESC-2052
> URL: https://issues.apache.org/jira/browse/XERCESC-2052
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 3.1.2, 3.1.3, 3.1.4
> Environment: Windows 32 and 64 bit compiled with VS2010
>Reporter: Nigel Meachen
>Assignee: Alberto Massari
> Fix For: 3.2.0
>
>
> The following constructor throws an EncodingException
> TranscodeToStr tTransCoder (L"中国制造 / 中國製造","UTF-8", 
> XMLPlatformUtils::fgMemoryManager);
> The code in TranscodeToStr::transcode allocates 26 bytes when 27 are needed, 
> however, it does not reach the reallocation logic as charsRead is returned by 
> trans->transcodeTo as zero. This only occurs in a Release build.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2050) wrong use of delete keyword in DTest.cpp

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-2050:
--
Affects Version/s: 3.1.0
   3.1.1
   3.1.3
   3.1.4

> wrong use of delete keyword in DTest.cpp
> 
>
> Key: XERCESC-2050
> URL: https://issues.apache.org/jira/browse/XERCESC-2050
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Samples/Tests
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
>Reporter: Hanno Böck
>Assignee: Alberto Massari
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: xerces-c-fix-alloc-dealloc.diff
>
>
> In the file DTest.cpp there is a wrong use of the delete keyword. The 
> variable hugeString is allocated with:
> char* hugeString=new char[HUGE_STRING+1];
> It gets deallocated with:
> delete hugeString;
> When allocating a variable with "new type[size]" one has to deallocate with 
> "delete [] variable". These kinds of errors can be seen when compiling with 
> address sanitizer. I'll attach a patch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2019) Error in memory allocation for even small messages.

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-2019:
--
Affects Version/s: 3.1.0
   3.1.2
   3.1.3
   3.1.4

> Error in memory allocation for even small messages.
> ---
>
> Key: XERCESC-2019
> URL: https://issues.apache.org/jira/browse/XERCESC-2019
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: windows XP vc9, linux and solaris
>Reporter: mahalakshmi
>Assignee: Alberto Massari
>  Labels: patch
> Fix For: 3.2.0
>
> Attachments: zippedfiles.zip
>
>
> I have my xsd schema using which i have created my standard xml.
> After creating the xml i fill the values for each tag in my xml. 
> I get memory allocation error when i try to traverse through the xml and set 
> the values for each tag in my xml.
> I get allocation error when my program calls the setTextcontent() of 
> xerces.It is crashing in DOMDocumentImpl.cpp allocate(size).(2nd if condition 
> in that function)
> Is there any setting that needs to be done for memory allocation? How much is 
> the maximum size of xml that xerces can parse?how do we manage the memory 
> allocation and deallocation in DOM?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2016) XML 1.0 5th edition support

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-2016:
--
Affects Version/s: (was: 3.1.1)

> XML 1.0 5th edition support
> ---
>
> Key: XERCESC-2016
> URL: https://issues.apache.org/jira/browse/XERCESC-2016
> Project: Xerces-C++
>  Issue Type: Improvement
>  Components: Non-Validating Parser
> Environment: All
>Reporter: Rob Cameron
>Assignee: Alberto Massari
> Fix For: 3.2.0
>
> Attachments: diff5e
>
>
> Xerces-C currently applies XML 1.0 4th edition rules to name characters
> in XML 1.0 documents.XML 1.0 5th edition permits a broader class
> of name characters, based on those permitted in XML 1.1.
> Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0
> 5th edition.
> Although our main work is with icXML, we've looked at making this change
> in Xerces-C original code base so that icXML support for XML 1.0 5e is
> compatible with us.
> I'm not entirely sure that I've handled everything, but the following change
> works in our test.  The change plan is below and a svn diff file is
> attached.
> Here is the change plan.
> --
> (1)  internal/CharTypeTables.hpp
> Rename gFirstNameChars1_1 to be gFirstNameChars
> Rename gNameChars1_1 to be gNameChars
> (2) util/XMLChar.cpp
> (2a)
>Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars
>Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1()
>  to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask.
> //
> //  Name characters are special. A name is made up of a number of
> //  different tables and some special case characters.
> //
> initOneTable(gNameChars, gNameCharMask);
> //
> //  Name characters are special. A name is made up of a number of
> //  different tables and some special case characters.
> //
> initOneTable(gNameChars, gNCNameCharMask);
> gTmpCharTable[chColon] &= ~gNCNameCharMask;
> //
> //  Then do the first name char
> //
> initOneTable(gFirstNameChars, gFirstNameCharMask);
> (2b) #define NEED_TO_GEN_TABLE
> compile and do a sample run of a Xerces app, generate table.out
> (2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp
> with that from table.out.
> (3) XMLChar.hpp
> Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar,
> XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar
> to each check for and allow characters in the #x1-#xE range
> else {
> if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F))
>if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF))
>return true;
> }
> (4)  Modify XMLReader::getName and XMLReader::getNCName
>to allow surrogate pairs in Names and NCNames
>(i.e., use the version 1.1 logic for both 1.0 and 1.1).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



Re: Integrating CMake support for xerces

2017-04-25 Thread Cantor, Scott
On 4/25/17, 3:17 PM, "Roger Leigh"  wrote:

> Switching to git would be wonderful.  We could also enable CI testing 
> with e.g. Travis or some other CI service on github at that time to 
> enable testing of all PRs, if that would be accceptable.  Or does the 
> Apache project provide any equivalent services internally?

There are already mirrors of the code at git.apache.org (and to github from 
there), and of course all CI tools can pull from svn just as easily as git. 
That's never been an impediment. I don't know if there are tests sufficient to 
be worth exercising like that or not.

> Regarding (3), it's a bit outside the scope of this CMake ticket.  My 
> intentions here were to get a build system which would provide a working 
> build on all platforms, including the unit tests.  I didn't want to go 
> down the rabbit hole at the same time.  Ideally, if we merge this to the 
> trunk and branch off a 3.2 and release that, more adventurous changes 
> could be then done on the trunk.  I'd rather have a working release with 
> the CMake support included than to do both and not have an immediately 
> usable and API compatible release!

+1

I wasn't suggesting anything else, and it makes sense to go ahead and branch 
again if there's going to be any real screwing around, I need a stable branch 
myself.

I have made some progress today after a few hours reviewing trunk and I'm only 
about 10 commits back from when I started cherry picking things back to the 3.1 
branch, at which point the trunk essentially froze. So far there is very little 
divergence, just a few small API additions that are unique to the trunk. So I 
don't foresee anything terribly risky about releasing this after some 
additional fixes, some testing, and incorporating your patch.

> That said, I'd not be averse to including support for standard C++; 
> using Xerces is often a bugbear due to its age.  All our code is now 
> C++11, with RAII wrappers to make Xerces play nicely.  Primarily the 
> lack of RAII, non-standard exception types, odd memory management 
> semantics and transcoding all input.

The problem with C++11 is it's just not portable to enough compilers outside of 
Windows. I'm aware gcc probably supports it but gcc on actual Linux distros 
that people still use heavily does not. If I can't build it on RH6 it's not 
usable for me, and since I'm the one doing most of the work right now...

Really, C++11 is beside the point. Simply good old C++ would fix many issues, 
but this code dates to back when using real C++ and the STL was just too 
non-portable, along with the usual Unix anti-C++ bias.

> Something worth noting is that our 
> (optional) ICU dependency switched to requiring C++11 with ICU 59.1.  It
>  switched to using the standard char16_t as its XML string type.  If 
> Xerces were to also switch (or at least use a suitable typedef), we 
>  could be using const char16_t* foo = u"UTF-16 strings" and/or u8"UTF-8" 
> strings directly in both the xerces sources and in client programs.  A 
> major usability improvement.

At a huge cost in portability unfortunately. Believe me, I wish that were 
viable for me. So, so much.

> In a recent performance testing exercise at work, we found string 
> transcoding inside xerces-c to be a major time sink--using valgrind 
> callgrind--it was one of the major uses of CPU time during parsing and 
> DOM processing.  It was slower than xerces-j for the same operations, 
> and this was likely to be a major cause.

I'm not sure that you're going to fix that. It's already using UTF-16 
internally. If there are problems with transcoding, I think that's just the 
cost of transcoding, I don't think the need to transcode goes away unless I'm 
missing something.

Anyway, within a week or two I expect to be able to put trunk in a position to 
accept your patch and we can continue on from there.

-- Scott




Re: Integrating CMake support for xerces

2017-04-25 Thread Roger Leigh

On 25/04/2017 18:56, Cantor, Scott wrote:

Since we are sharing plans, we (as in Code Synthesis) are planning
to package Xerces-C++ for build2[1] in the near future (but no
definite time-frame). While I haven't looked into this closely
yet, the options we consider range between just packaging it as
is to pretty much forking it. The main reasons for forking would
be: (1) to switch to git (life is just too short for svn), (2)
to get rid of the Apache bureaucracy, and (3) rip all the legacy
parts out and clean things up (maybe even switching to C++11/14).


(1) doesn't matter to me, but +1000 to (2) and I have very little compunction 
about (3), aside from the obvious fact that once you start pulling that thread, 
you're on slippery ground.

I wasn't prepared to really go so far as to start tossing things out or 
proposing really invasive changes but it sounds like cleaning up and releasing 
the trunk would serve both short term and longer term ends here.


Switching to git would be wonderful.  We could also enable CI testing 
with e.g. Travis or some other CI service on github at that time to 
enable testing of all PRs, if that would be accceptable.  Or does the 
Apache project provide any equivalent services internally?


Regarding (3), it's a bit outside the scope of this CMake ticket.  My 
intentions here were to get a build system which would provide a working 
build on all platforms, including the unit tests.  I didn't want to go 
down the rabbit hole at the same time.  Ideally, if we merge this to the 
trunk and branch off a 3.2 and release that, more adventurous changes 
could be then done on the trunk.  I'd rather have a working release with 
the CMake support included than to do both and not have an immediately 
usable and API compatible release!


That said, I'd not be averse to including support for standard C++; 
using Xerces is often a bugbear due to its age.  All our code is now 
C++11, with RAII wrappers to make Xerces play nicely.  Primarily the 
lack of RAII, non-standard exception types, odd memory management 
semantics and transcoding all input.  Something worth noting is that our 
(optional) ICU dependency switched to requiring C++11 with ICU 59.1.  It 
switched to using the standard char16_t as its XML string type.  If 
Xerces were to also switch (or at least use a suitable typedef), we 
could be using const char16_t* foo = u"UTF-16 strings" and/or u8"UTF-8" 
strings directly in both the xerces sources and in client programs.  A 
major usability improvement.


In a recent performance testing exercise at work, we found string 
transcoding inside xerces-c to be a major time sink--using valgrind 
callgrind--it was one of the major uses of CPU time during parsing and 
DOM processing.  It was slower than xerces-j for the same operations, 
and this was likely to be a major cause.


Certainly cleaning up and releasing trunk would be a step towards any of 
that, should there be a consensus for that.



Regards,
Roger


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2077) Add CMake build system

2017-04-25 Thread Scott Cantor (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983444#comment-15983444
 ] 

Scott Cantor commented on XERCESC-2077:
---

I doubt anything you did would be a risk, but any testing of trunk for any 
reason is incredibly welcome because that's the main sticking point to getting 
it released.

> Add CMake build system
> --
>
> Key: XERCESC-2077
> URL: https://issues.apache.org/jira/browse/XERCESC-2077
> Project: Xerces-C++
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.1.4
> Environment: All
>Reporter: Roger Leigh
>  Labels: build, cmake, patch
> Attachments: 0001-cmake-Add-CMake-build-system.patch, 
> 0001-cmake-Add-CMake-build-system-trunk.patch
>
>
> h4. Introduction
> The attached patch implements a CMake build for Xerces-C++.
> I have spent significant effort performing a "comprehensive" conversion of 
> the existing GNU autotools and MSVC project file logic to a unified CMake 
> build which supports all platforms with a single set of build files, as well 
> as testing it exhaustively (see below). The existing GNU autotools build and 
> MSVC project builds will continue to function and are unaffected by this 
> addition.
> h5. References
> - http://mail-archives.apache.org/mod_mbox/xerces-c-dev/201302.mbox/browser
> - http://mail-archives.apache.org/mod_mbox/xerces-c-dev/201506.mbox/browser
> - https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1
> h4. Background
> CMake is a meta-build system which generates the build files for a specified 
> build system, such as make, Visual Studio msbuild, nmake, ninja or a number 
> of other build tools and IDEs.  This allows Xerces-C++ to be built on any 
> supported platform with the native tools for that platform.
> The reason why I originally needed this was due to the large maintenance 
> burden of patching the provided Visual Studio project files, both for fixing 
> bugs in those files and in being able to support versions of Visual Studio 
> which aren't yet supported by the provided project files or for unsupported 
> configurations e.g. Clang/C2, other platforms etc.  The lack of an install 
> target also meant that to integrate this with a larger build required 
> manually copying bits out of the build tree.  The cost of debugging and 
> patching the existing project files for use in our CI builds was getting too 
> great--maintaining and using this CMake build out of tree will be cheaper and 
> more robust.  However, given that other people have also requested such 
> support in the past, I thought it might benefit others to have this merged 
> upstream so that it would be available to the benefit of all.
> I have done a direct conversion of every autoconf option and feature test.  
> Where there wasn't a direct CMake equivalent, I've written each feature test 
> to exactly match the autoconf behaviour.  The automake Makefile.am logic is 
> directly represented in the corresponding CMakeLists.txt files.  Broadly:
> ||Autotools||CMake||
> |{{configure.ac}}, {{Makefile.am}}|{{CMakeLists.txt}}|
> |{{*/Makefile.am}}|{{*/CMakeLists.txt}}|
> |{{m4/*}}|{{cmake/*}}|
> |{{src/xercesc/util/Xerces_autoconf_config.hpp.in}}|{{src/xercesc/util/Xerces_autoconf_config.hpp.cmake.in}}|
> |_autoheader_|config.h.cmake.in|
> |{{tools/createdocs.sh}}|{{CMakeLists.txt}} (custom target)|
> |{{scripts/sanityTest.pl}}|{{cmake/XercesTest.cmake}} (direct support)|
> |{{scripts/sanityTest_ExpectedResult.log}}|{{test/expected/\*}}, 
> {{samples/expected/\*}} (individual log files)|
> And there's a section added to the documentation giving an overview of how to 
> use it, in the same style as the autotools section.
> h5. Enhancements over the existing build systems
> - Universal build for any platform and build system supported by CMake
> - Full support for feature and library detection on Windows, including
>   discovery of ICU libraries; it's no longer static, using (long broken)
>   ICU configurations in the project files
> - An install target now exists on Windows, so the various pieces don't
>   need manually copying out of the build tree
> - Parallel build speed improvements when using ninja to replace make
>   or msbuild; the speedup with the latter is significant
> - Export of CMake configuration in addition to pkg-config, to make
>   Xerces-C++ integrate with downstream projects using Xerces-C++ and
>   cmake; this includes all dependency information of the libraries
>   Xerces was linked with, i.e. transitive dependencies.
> - Installs the HTML documentation
> - Targets are provided for regenerating the documentation (docs and
>   apidocs)
> - Documentation can be edited and rebuilt from within Visual Studio
> - Unit tests can be run on all supported platforms
> - Unit tests can be run in parallel
> - Unit 

[jira] [Commented] (XERCESC-2077) Add CMake build system

2017-04-25 Thread Roger Leigh (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983441#comment-15983441
 ] 

Roger Leigh commented on XERCESC-2077:
--

Now rebased onto trunk.  I'm looking at re-running our test CI jobs on the new 
branch to verify it's working the same as the older patch for 3.1, but given 
that there are zero changes I don't expect any problems--it passed all the 
tests on the platform I tested it on by hand.

> Add CMake build system
> --
>
> Key: XERCESC-2077
> URL: https://issues.apache.org/jira/browse/XERCESC-2077
> Project: Xerces-C++
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.1.4
> Environment: All
>Reporter: Roger Leigh
>  Labels: build, cmake, patch
> Attachments: 0001-cmake-Add-CMake-build-system.patch, 
> 0001-cmake-Add-CMake-build-system-trunk.patch
>
>
> h4. Introduction
> The attached patch implements a CMake build for Xerces-C++.
> I have spent significant effort performing a "comprehensive" conversion of 
> the existing GNU autotools and MSVC project file logic to a unified CMake 
> build which supports all platforms with a single set of build files, as well 
> as testing it exhaustively (see below). The existing GNU autotools build and 
> MSVC project builds will continue to function and are unaffected by this 
> addition.
> h5. References
> - http://mail-archives.apache.org/mod_mbox/xerces-c-dev/201302.mbox/browser
> - http://mail-archives.apache.org/mod_mbox/xerces-c-dev/201506.mbox/browser
> - https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1
> h4. Background
> CMake is a meta-build system which generates the build files for a specified 
> build system, such as make, Visual Studio msbuild, nmake, ninja or a number 
> of other build tools and IDEs.  This allows Xerces-C++ to be built on any 
> supported platform with the native tools for that platform.
> The reason why I originally needed this was due to the large maintenance 
> burden of patching the provided Visual Studio project files, both for fixing 
> bugs in those files and in being able to support versions of Visual Studio 
> which aren't yet supported by the provided project files or for unsupported 
> configurations e.g. Clang/C2, other platforms etc.  The lack of an install 
> target also meant that to integrate this with a larger build required 
> manually copying bits out of the build tree.  The cost of debugging and 
> patching the existing project files for use in our CI builds was getting too 
> great--maintaining and using this CMake build out of tree will be cheaper and 
> more robust.  However, given that other people have also requested such 
> support in the past, I thought it might benefit others to have this merged 
> upstream so that it would be available to the benefit of all.
> I have done a direct conversion of every autoconf option and feature test.  
> Where there wasn't a direct CMake equivalent, I've written each feature test 
> to exactly match the autoconf behaviour.  The automake Makefile.am logic is 
> directly represented in the corresponding CMakeLists.txt files.  Broadly:
> ||Autotools||CMake||
> |{{configure.ac}}, {{Makefile.am}}|{{CMakeLists.txt}}|
> |{{*/Makefile.am}}|{{*/CMakeLists.txt}}|
> |{{m4/*}}|{{cmake/*}}|
> |{{src/xercesc/util/Xerces_autoconf_config.hpp.in}}|{{src/xercesc/util/Xerces_autoconf_config.hpp.cmake.in}}|
> |_autoheader_|config.h.cmake.in|
> |{{tools/createdocs.sh}}|{{CMakeLists.txt}} (custom target)|
> |{{scripts/sanityTest.pl}}|{{cmake/XercesTest.cmake}} (direct support)|
> |{{scripts/sanityTest_ExpectedResult.log}}|{{test/expected/\*}}, 
> {{samples/expected/\*}} (individual log files)|
> And there's a section added to the documentation giving an overview of how to 
> use it, in the same style as the autotools section.
> h5. Enhancements over the existing build systems
> - Universal build for any platform and build system supported by CMake
> - Full support for feature and library detection on Windows, including
>   discovery of ICU libraries; it's no longer static, using (long broken)
>   ICU configurations in the project files
> - An install target now exists on Windows, so the various pieces don't
>   need manually copying out of the build tree
> - Parallel build speed improvements when using ninja to replace make
>   or msbuild; the speedup with the latter is significant
> - Export of CMake configuration in addition to pkg-config, to make
>   Xerces-C++ integrate with downstream projects using Xerces-C++ and
>   cmake; this includes all dependency information of the libraries
>   Xerces was linked with, i.e. transitive dependencies.
> - Installs the HTML documentation
> - Targets are provided for regenerating the documentation (docs and
>   apidocs)
> - Documentation can be edited and rebuilt from within Visual 

[jira] [Updated] (XERCESC-2077) Add CMake build system

2017-04-25 Thread Roger Leigh (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Leigh updated XERCESC-2077:
-
Attachment: 0001-cmake-Add-CMake-build-system-trunk.patch

Patch after rebasing onto trunk.  The only changes were fixing three minor 
conflicts in Makefile.am EXTRA_DIST.

> Add CMake build system
> --
>
> Key: XERCESC-2077
> URL: https://issues.apache.org/jira/browse/XERCESC-2077
> Project: Xerces-C++
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.1.4
> Environment: All
>Reporter: Roger Leigh
>  Labels: build, cmake, patch
> Attachments: 0001-cmake-Add-CMake-build-system.patch, 
> 0001-cmake-Add-CMake-build-system-trunk.patch
>
>
> h4. Introduction
> The attached patch implements a CMake build for Xerces-C++.
> I have spent significant effort performing a "comprehensive" conversion of 
> the existing GNU autotools and MSVC project file logic to a unified CMake 
> build which supports all platforms with a single set of build files, as well 
> as testing it exhaustively (see below). The existing GNU autotools build and 
> MSVC project builds will continue to function and are unaffected by this 
> addition.
> h5. References
> - http://mail-archives.apache.org/mod_mbox/xerces-c-dev/201302.mbox/browser
> - http://mail-archives.apache.org/mod_mbox/xerces-c-dev/201506.mbox/browser
> - https://github.com/rleigh-codelibre/xerces-c/tree/cmake-3.1
> h4. Background
> CMake is a meta-build system which generates the build files for a specified 
> build system, such as make, Visual Studio msbuild, nmake, ninja or a number 
> of other build tools and IDEs.  This allows Xerces-C++ to be built on any 
> supported platform with the native tools for that platform.
> The reason why I originally needed this was due to the large maintenance 
> burden of patching the provided Visual Studio project files, both for fixing 
> bugs in those files and in being able to support versions of Visual Studio 
> which aren't yet supported by the provided project files or for unsupported 
> configurations e.g. Clang/C2, other platforms etc.  The lack of an install 
> target also meant that to integrate this with a larger build required 
> manually copying bits out of the build tree.  The cost of debugging and 
> patching the existing project files for use in our CI builds was getting too 
> great--maintaining and using this CMake build out of tree will be cheaper and 
> more robust.  However, given that other people have also requested such 
> support in the past, I thought it might benefit others to have this merged 
> upstream so that it would be available to the benefit of all.
> I have done a direct conversion of every autoconf option and feature test.  
> Where there wasn't a direct CMake equivalent, I've written each feature test 
> to exactly match the autoconf behaviour.  The automake Makefile.am logic is 
> directly represented in the corresponding CMakeLists.txt files.  Broadly:
> ||Autotools||CMake||
> |{{configure.ac}}, {{Makefile.am}}|{{CMakeLists.txt}}|
> |{{*/Makefile.am}}|{{*/CMakeLists.txt}}|
> |{{m4/*}}|{{cmake/*}}|
> |{{src/xercesc/util/Xerces_autoconf_config.hpp.in}}|{{src/xercesc/util/Xerces_autoconf_config.hpp.cmake.in}}|
> |_autoheader_|config.h.cmake.in|
> |{{tools/createdocs.sh}}|{{CMakeLists.txt}} (custom target)|
> |{{scripts/sanityTest.pl}}|{{cmake/XercesTest.cmake}} (direct support)|
> |{{scripts/sanityTest_ExpectedResult.log}}|{{test/expected/\*}}, 
> {{samples/expected/\*}} (individual log files)|
> And there's a section added to the documentation giving an overview of how to 
> use it, in the same style as the autotools section.
> h5. Enhancements over the existing build systems
> - Universal build for any platform and build system supported by CMake
> - Full support for feature and library detection on Windows, including
>   discovery of ICU libraries; it's no longer static, using (long broken)
>   ICU configurations in the project files
> - An install target now exists on Windows, so the various pieces don't
>   need manually copying out of the build tree
> - Parallel build speed improvements when using ninja to replace make
>   or msbuild; the speedup with the latter is significant
> - Export of CMake configuration in addition to pkg-config, to make
>   Xerces-C++ integrate with downstream projects using Xerces-C++ and
>   cmake; this includes all dependency information of the libraries
>   Xerces was linked with, i.e. transitive dependencies.
> - Installs the HTML documentation
> - Targets are provided for regenerating the documentation (docs and
>   apidocs)
> - Documentation can be edited and rebuilt from within Visual Studio
> - Unit tests can be run on all supported platforms
> - Unit tests can be run in parallel
> - Unit tests verify individual test output 

[jira] [Updated] (XERCESC-1962) memory leak with XInclude

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-1962:
--
Affects Version/s: 3.1.0
   3.1.2
   3.1.3
   3.1.4

> memory leak with XInclude
> -
>
> Key: XERCESC-1962
> URL: https://issues.apache.org/jira/browse/XERCESC-1962
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: XInclude
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: Linux Ubuntu 10.10 i386
>Reporter: Paulo Zanoni
>Assignee: Alberto Massari
>  Labels: example, leak, patch,
> Fix For: 3.2.0
>
> Attachments: xinclude-example.xml, xinclude_leak.cpp, 
> xinclude-leak.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When xerces-c++ is parsing files with XInclude, it leaks memory. You can 
> check the memory leak by running "valgrind".
> This behavior can be reproduced both with the "XInclude" binary and with 
> custom code.
> All you need to do is:
> - create a parser
> - set validation scheme to Val_Auto
> - setDoNamespaces(true)
> - setDoXInclude(true)
> - parser.parse("example.xml");
> I have tracked the bug. The "actual" leak is inside 
> XIncludeUtils::reportError. It uses XMLPlatformUtils::loadMsgSet, which calls 
> loadAMsgSet, which allocates memory and returns to its callee (see 
> XMLPlatformUtils::loadAMsgSet inside utils/PlatformUtils.cpp). The memory 
> allocated by loadAMsgSet is never freed.
> The solution:
> In file xinclude/XIncludeUtils.cpp, function XIncludeUtils::reportError, the 
> pointer allocated by XMLPlatformUtils::loadMsgSet should be freed. For my 
> testings, I just added a "delete erMsgLoader" at the end of the scope, but I 
> am not sure this is enough (since I'm not sure if any of the functions 
> between loadMsgSet and the end of the scope can throw exceptions). It is up 
> to you, developers, find a proper solution =D
> I'll attach examples.
> I locally tested my patch (I rebuild the Ubuntu package) and it seems to have 
> worked, but I didn't test much. I am not familiar with xerces-c code so I'm 
> not sure if it can break anything.
> Thanks,
> Paulo



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-1962) memory leak with XInclude

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-1962:
--
Remaining Estimate: 0h  (was: 2h)
 Original Estimate: 0h  (was: 2h)

> memory leak with XInclude
> -
>
> Key: XERCESC-1962
> URL: https://issues.apache.org/jira/browse/XERCESC-1962
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: XInclude
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: Linux Ubuntu 10.10 i386
>Reporter: Paulo Zanoni
>Assignee: Alberto Massari
>  Labels: example, leak, patch,
> Fix For: 3.2.0
>
> Attachments: xinclude-example.xml, xinclude_leak.cpp, 
> xinclude-leak.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When xerces-c++ is parsing files with XInclude, it leaks memory. You can 
> check the memory leak by running "valgrind".
> This behavior can be reproduced both with the "XInclude" binary and with 
> custom code.
> All you need to do is:
> - create a parser
> - set validation scheme to Val_Auto
> - setDoNamespaces(true)
> - setDoXInclude(true)
> - parser.parse("example.xml");
> I have tracked the bug. The "actual" leak is inside 
> XIncludeUtils::reportError. It uses XMLPlatformUtils::loadMsgSet, which calls 
> loadAMsgSet, which allocates memory and returns to its callee (see 
> XMLPlatformUtils::loadAMsgSet inside utils/PlatformUtils.cpp). The memory 
> allocated by loadAMsgSet is never freed.
> The solution:
> In file xinclude/XIncludeUtils.cpp, function XIncludeUtils::reportError, the 
> pointer allocated by XMLPlatformUtils::loadMsgSet should be freed. For my 
> testings, I just added a "delete erMsgLoader" at the end of the scope, but I 
> am not sure this is enough (since I'm not sure if any of the functions 
> between loadMsgSet and the end of the scope can throw exceptions). It is up 
> to you, developers, find a proper solution =D
> I'll attach examples.
> I locally tested my patch (I rebuild the Ubuntu package) and it seems to have 
> worked, but I didn't test much. I am not familiar with xerces-c code so I'm 
> not sure if it can break anything.
> Thanks,
> Paulo



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-1967) Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset parameter of the HTTP content-type: header

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-1967:
--
Remaining Estimate: 0h  (was: 4h)
 Original Estimate: 0h  (was: 4h)

> Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the 
> charset parameter of the HTTP content-type: header
> 
>
> Key: XERCESC-1967
> URL: https://issues.apache.org/jira/browse/XERCESC-1967
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Non-Validating Parser
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: Mac OS X Snow Leopard (Intel).  
> (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
> And also tested the XMLmind XML editor on same platorm.
>Reporter: Leif Halvard Silli
>Assignee: Alberto Massari
> Fix For: 3.2.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> [1] http://www.w3.org/mid/20110609033243875895.0f711...@xn--mlform-iua.no
> [2] http://www.w3.org/mid/20110609090401531862.04ce1...@xn--mlform-iua.no
> It is a XML 1.0 spec vioation. well-formed violation.
> Test cases without XML declaration: http://malform.no/testing/html5/bom/
> Test cases *with* XML declartion to be added later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-1967) Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset parameter of the HTTP content-type: header

2017-04-25 Thread Scott Cantor (JIRA)

 [ 
https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Cantor updated XERCESC-1967:
--
Affects Version/s: 3.1.0
   3.1.2
   3.1.3
   3.1.4

> Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the 
> charset parameter of the HTTP content-type: header
> 
>
> Key: XERCESC-1967
> URL: https://issues.apache.org/jira/browse/XERCESC-1967
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Non-Validating Parser
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4
> Environment: Mac OS X Snow Leopard (Intel).  
> (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
> And also tested the XMLmind XML editor on same platorm.
>Reporter: Leif Halvard Silli
>Assignee: Alberto Massari
> Fix For: 3.2.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> [1] http://www.w3.org/mid/20110609033243875895.0f711...@xn--mlform-iua.no
> [2] http://www.w3.org/mid/20110609090401531862.04ce1...@xn--mlform-iua.no
> It is a XML 1.0 spec vioation. well-formed violation.
> Test cases without XML declaration: http://malform.no/testing/html5/bom/
> Test cases *with* XML declartion to be added later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



RE: Integrating CMake support for xerces

2017-04-25 Thread Cantor, Scott
> Since we are sharing plans, we (as in Code Synthesis) are planning
> to package Xerces-C++ for build2[1] in the near future (but no
> definite time-frame). While I haven't looked into this closely
> yet, the options we consider range between just packaging it as
> is to pretty much forking it. The main reasons for forking would
> be: (1) to switch to git (life is just too short for svn), (2)
> to get rid of the Apache bureaucracy, and (3) rip all the legacy
> parts out and clean things up (maybe even switching to C++11/14).

(1) doesn't matter to me, but +1000 to (2) and I have very little compunction 
about (3), aside from the obvious fact that once you start pulling that thread, 
you're on slippery ground.

I wasn't prepared to really go so far as to start tossing things out or 
proposing really invasive changes but it sounds like cleaning up and releasing 
the trunk would serve both short term and longer term ends here.

-- Scott


-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org