Re: Issue with collapsing whitespace in a union during schema validation
Hi, Doug. This is a very clear and useful test case demonstrating the issue. I think you are correct. Your sample file should validate against either schema. I am ccing the c-dev@xerces list because it appears to be a bug in Xerces itself. We've been taking a look at it in the context of icXML, our accelerated version of Xerces incorporating Parabix (parallel bit stream) technology. Unfortunately, icXML-0.95 also has the bug. In essence, I think the problem is that the default processing of unions in Xerces/icXML is to process them as if they have collapse as the value of the whiteSpace facet.But this is not in accord with the XML Schema spec as I read it. We'll put it on the roadmap for a fix in icXML and we may be able to submit a patch for Xerces as well. On Mon, Jan 6, 2014 at 7:02 AM, Glidden, Douglass A douglass.a.glid...@boeing.com wrote: Consider the following XML schema and document: xsd:schema targetNamespace=http://example.com/schema; elementFormDefault=qualified xmlns:xsd=http://www.w3.org/2001/XMLSchema; xsd:element name=CannedComment xsd:simpleType xsd:restriction base=xsd:string xsd:enumeration value=This is a canned comment./ xsd:enumeration value=This is a canned comment. Notice the double space/ /xsd:restriction /xsd:simpleType /xsd:element /xsd:schema CannedComment xmlns=http://example.com/schema;This is a canned comment. Notice the double space/CannedComment When Xerces-C++ is used to validate this document against the schema, it passes validation, as expected; however, consider the following modified version of the XML schema: xsd:schema targetNamespace=http://example.com/schema; elementFormDefault=qualified xmlns:xsd=http://www.w3.org/2001/XMLSchema; xsd:element name=CannedComment xsd:simpleType xsd:union xsd:simpleType xsd:restriction base=xsd:string xsd:minLength value=1/ xsd:maxLength value=48/ /xsd:restriction /xsd:simpleType xsd:simpleType xsd:restriction base=xsd:string xsd:enumeration value=This is a canned comment./ xsd:enumeration value=This is a canned comment. Notice the double space/ /xsd:restriction /xsd:simpleType /xsd:union /xsd:simpleType /xsd:element /xsd:schema As far as I can tell, the document above should still successfully validate against this new schema (and incidentally, oXygen agrees with me); however, Xerces-C++ fails the document with the following error message: value 'This is a canned comment. Notice the double space' does not match any member types of the union Notice that, in the error message, the double space has been condensed to a single space. Does this indicate an issue with my schema, an issue in how I am using Xerces, or something else? Thanks for any help you can offer, Doug Glidden Software Engineer The Boeing Company douglass.a.glid...@boeing.commailto:douglass.a.glid...@boeing.com
Re: [jira] [Created] (XERCESC-2017) Xerces-C++ is not always able to handle W3C standard keyref
Hi, Mihran. Can you provide actual files, including a complete xsd file? On Sat, Jul 6, 2013 at 12:03 AM, Mihran Hovsepyan (JIRA) xerces-c-...@xml.apache.org wrote: Mihran Hovsepyan created XERCESC-2017: - Summary: Xerces-C++ is not always able to handle W3C standard keyref Key: XERCESC-2017 URL: https://issues.apache.org/jira/browse/XERCESC-2017 Project: Xerces-C++ Issue Type: Bug Components: Validating Parser (XML Schema) Affects Versions: 3.1.1 Reporter: Mihran Hovsepyan I use *Xerces-C++ 3.1.1* to validate schema of xml files. Bellow is example of some such file. CONFIG DBS DB ID=D !--...-- /DB VDB ID=V !--...-- PARTS PART_DB ID=V1 / PART_DB ID=V2 / /PARTS /VDB !--...-- /DBS HOSTS HOST ID=host1 DBS DB ID=D !--...-- /DB DB ID=V1 !--...-- /DB DB ID=V2 !--...-- /DB /DBS VDBS DB ID=V !--...-- /DB /VDBS /HOST !--...-- /HOSTS /CONFIG And in its schema the following key and keyref are defined for the root element `CONFIG`. xsd:key name=DbIdKey xsd:selector xpath=./DBS/DB|./DBS/VDB|./DBS/VDB/PARTS/PART_DB / xsd:field xpath=@ID / /xsd:key xsd:keyref name=DbIdRef refer=DbIdKey xsd:selector xpath=./HOSTS/HOST/DBS/DB|./HOSTS/HOST/VDBS/DB / xsd:field xpath=@ID / /xsd:keyref So, though the file meets requirements of the schema according to *W3C* and some validators understand that (for instance XML validator of *MS Visual Studio*), *Xerces-C++ 3.1.1* unable to do that. It complains: identity constraint key for element 'CONFIG' not found (last_line, last_column_of_last_line) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Created] (XERCESC-2016) XML 1.0 5th edition support
Rob Cameron created XERCESC-2016: Summary: XML 1.0 5th edition support Key: XERCESC-2016 URL: https://issues.apache.org/jira/browse/XERCESC-2016 Project: Xerces-C++ Issue Type: Improvement Components: Non-Validating Parser Affects Versions: 3.1.1 Environment: All Reporter: Rob Cameron Fix For: 3.1.2 Xerces-C currently applies XML 1.0 4th edition rules to name characters in XML 1.0 documents.XML 1.0 5th edition permits a broader class of name characters, based on those permitted in XML 1.1. Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0 5th edition. Although our main work is with icXML, we've looked at making this change in Xerces-C original code base so that icXML support for XML 1.0 5e is compatible with us. I'm not entirely sure that I've handled everything, but the following change works in our test. The change plan is below and a svn diff file is attached. Here is the change plan. -- (1) internal/CharTypeTables.hpp Rename gFirstNameChars1_1 to be gFirstNameChars Rename gNameChars1_1 to be gNameChars (2) util/XMLChar.cpp (2a) Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1() to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask. // // Name characters are special. A name is made up of a number of // different tables and some special case characters. // initOneTable(gNameChars, gNameCharMask); // // Name characters are special. A name is made up of a number of // different tables and some special case characters. // initOneTable(gNameChars, gNCNameCharMask); gTmpCharTable[chColon] = ~gNCNameCharMask; // // Then do the first name char // initOneTable(gFirstNameChars, gFirstNameCharMask); (2b) #define NEED_TO_GEN_TABLE compile and do a sample run of a Xerces app, generate table.out (2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp with that from table.out. (3) XMLChar.hpp Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar, XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar to each check for and allow characters in the #x1-#xE range else { if ((toCheck = 0xD800) (toCheck = 0xDB7F)) if ((toCheck2 = 0xDC00) (toCheck2 = 0xDFFF)) return true; } (4) Modify XMLReader::getName and XMLReader::getNCName to allow surrogate pairs in Names and NCNames (i.e., use the version 1.1 logic for both 1.0 and 1.1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Updated] (XERCESC-2016) XML 1.0 5th edition support
[ https://issues.apache.org/jira/browse/XERCESC-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Cameron updated XERCESC-2016: - Attachment: diff5e Here is the patch to add XML 1.05e support. XML 1.0 5th edition support --- Key: XERCESC-2016 URL: https://issues.apache.org/jira/browse/XERCESC-2016 Project: Xerces-C++ Issue Type: Improvement Components: Non-Validating Parser Affects Versions: 3.1.1 Environment: All Reporter: Rob Cameron Fix For: 3.1.2 Attachments: diff5e Xerces-C currently applies XML 1.0 4th edition rules to name characters in XML 1.0 documents.XML 1.0 5th edition permits a broader class of name characters, based on those permitted in XML 1.1. Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0 5th edition. Although our main work is with icXML, we've looked at making this change in Xerces-C original code base so that icXML support for XML 1.0 5e is compatible with us. I'm not entirely sure that I've handled everything, but the following change works in our test. The change plan is below and a svn diff file is attached. Here is the change plan. -- (1) internal/CharTypeTables.hpp Rename gFirstNameChars1_1 to be gFirstNameChars Rename gNameChars1_1 to be gNameChars (2) util/XMLChar.cpp (2a) Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1() to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask. // // Name characters are special. A name is made up of a number of // different tables and some special case characters. // initOneTable(gNameChars, gNameCharMask); // // Name characters are special. A name is made up of a number of // different tables and some special case characters. // initOneTable(gNameChars, gNCNameCharMask); gTmpCharTable[chColon] = ~gNCNameCharMask; // // Then do the first name char // initOneTable(gFirstNameChars, gFirstNameCharMask); (2b) #define NEED_TO_GEN_TABLE compile and do a sample run of a Xerces app, generate table.out (2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp with that from table.out. (3) XMLChar.hpp Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar, XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar to each check for and allow characters in the #x1-#xE range else { if ((toCheck = 0xD800) (toCheck = 0xDB7F)) if ((toCheck2 = 0xDC00) (toCheck2 = 0xDFFF)) return true; } (4) Modify XMLReader::getName and XMLReader::getNCName to allow surrogate pairs in Names and NCNames (i.e., use the version 1.1 logic for both 1.0 and 1.1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Re: Regarding Xercesc++ performance
Hi, Rinil. What is your goal? If you are considering choosing Xerces 2.4 vs 3.1, here are some other things to think about. (a) Xerces 3.1 has better support for later XML standards (b) Xerces 3.1 has bug fixes over 2.4 (c) Xerces 3.1 has support for 64-bit architectures (d) Any future developments and improvements will likely only be made to the Xerces 3.1 line. If performance is critical, you may want to consider icXML. This is a highly accelerated version of Xerces 3.1.1 that we are building based on the systematic incorporation of parallel bit stream technology in the underlying engine. icXML substantially speeds up both SAX-based and DOM-based parsing. We will be presenting our work with icXML at Balisage 2013 in Montreal this August. Rob Cameron CTO, International Characters, Inc. On Thu, Jun 13, 2013 at 10:22 PM, Baxi, Rinil Rushabh rinil.b...@hp.com wrote: Hi Dan, I have checked with both the parsers SAX and DOM and almost same result I got. Best Regards, Rinil From: Huantes, Dan F (TASC) [mailto:dan.huan...@tasc.com] Sent: Thursday, June 13, 2013 6:20 PM To: c-dev@xerces.apache.org Subject: RE: Regarding Xercesc++ performance Nice work. I’m curious as to whether your performance testing is DOM based, SAX based, or both. I ask because my anecdotal experience is that files exceeding 1MB experience large performance hits due to the inherent nature of the DOM model. Under these scenarios, I have used SAX because it’s several orders of magnitude faster (i.e. seconds vs minutes). We used 2.8 before but never thought to compare the difference in performance between different versions. You may be on to something. Thanks. Dan From: Baxi, Rinil Rushabh [mailto:rinil.b...@hp.com] Sent: Thursday, June 13, 2013 4:07 AM To: c-dev@xerces.apache.org Subject: Regarding Xercesc++ performance Hi All, I have 2 Xerces-C++ libraries available on my platform (2.4 and 3.1). Both are built without threads. I am trying to compare performance of both of them. To compare performance I am using different sized xml files to parse using the samples (1kb, 65kb, 256Kb, 1Mb, 2Mb, 5Mb and 15Mb). I have put each sample in a script and run the same sample 1000 times to compare the parsing time. We observed that till 1Mb xml file size performance of Xerces-C++ 3.1 is better after that it starts deteriorating. With 15Mb xml file 3.1 sample takes almost 30% more time than with 2.4 same sample. Please let me know whether this is the right method to measure performance or not. If no then how can we measure that. One more question is Why such performance degradation? Thanks in advance. Best Regards, Rinil CONFIDENTIALITY NOTICE: This message and any attachments or files transmitted with it (collectively, the Message) are intended only for the addressee and may contain information that is privileged, proprietary and/or prohibited from disclosure by law or contract. If you are not the intended recipient: (a) please do not read, copy or retransmit the Message; (b) permanently delete and/or destroy all electronic and hard copies of the Message; (c) notify us by return email; and (d) you are hereby notified that any dissemination, distribution or copying of the Message is strictly prohibited. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Xerces Performance Acceleration Project: icXML 0.9 is available
I am pleased to announce the availability of public SVN access to icXML 0.9 source code with svn or trac. icXML is our modified version of the Xerces C++, systematically restructured to improve performance through the integration of parallel bit stream technology. svn co http://parabix.costar.sfu.ca/svn/icXML/icXML-0.9 Trac browser: http://parabix.costar.sfu.ca/browser/icXML As always, we are interested in feedback from Xerces C++ developers and users. We are also happy to announce that our paper describing icXML has been accepted for presentation at Balisage 2013 in Montreal, August 6-9. Hope to see you there! Rob Cameron, CTO, International Characters, Inc Professor of Computing Science, Simon Fraser University http://www.international-characters.com/ http://parabix.costar.sfu.ca/ - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Re: Xerces Performance Acceleration Project: icXML
That's great, Gareth. I do think these lists are the right starting point, and that creating a community of developers/users is key.I think I won't approach the gene...@incubator.apache.org list immediately because it just seems premature.(But thanks anyway, Michael, it seems like a good route if we aren't able to find a champion through the process of community building.) We'll certainly be continuing the work within both ICI and the research lab at SFU. Incidentally, for someone with the interest, doing open-source project work could be combined with graduate studies ... On Fri, Feb 1, 2013 at 4:20 AM, Gareth Reakes gar...@we7.com wrote: Hey Rob, We are interested in feedback and interest from developers and potential users. We are also interested in identifying a potential Champion who could help put us on track to become an official Xerces subproject. If you're actively looking for a champion you may want to post to the gene...@incubator.apache.org list to get the attention of a larger audience of ASF members who might be interested in that role. I would be happy to help out here although posting to that list still makes sense. The key thing we need to do is work on creating a community that will support the ongoing development and maintenance. There are not many people that contribute to Xerces these days and any reticence you may sense is because we would want to make sure that there was enough interest to support over the medium/long term. Its not going to be code we know about or can support easily so the worst thing for us would be to accept the (very generous) code gift but then be unable to support the users that come with it. On a technical level I think what you have done is really cool and useful. I would be happy to chat more about this if you want. Gareth -- Gareth Reakes, CTO we7 - Great Free Music +44-20-7117-0809 http://www.we7.com The music business is a cruel and shallow money trench, a long plastic hallway where thieves and pimps run free, and good men die like dogs. There's also a negative side. - Hunter S. Thompson - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Xerces Performance Acceleration Project: icXML
icXML is the name of our project to dramatically accelerate Xerces performance on modern commodity processors by taking advantage of SIMD and multicore capabilities and parallel bit stream technology. We are interested in feedback and interest from developers and potential users. We are also interested in identifying a potential Champion who could help put us on track to become an official Xerces subproject. Version 0.8 of icXML has been released and is available together with a development version on the costar.sfu.ca server. svn co http://parabix.costar.sfu.ca/svn/icXML/icXML-0.8 svn co http://parabix.costar.sfu.ca/svn/icXML/icXML-devel Trac browser: http://parabix.costar.sfu.ca/browser/icXML To get an idea of the performance prospects, here are end-to-end figures using Xerces-C 3.1.1 and icXML with a GML-to-SVG conversion application. Xerces-C 3.1.1 Performance counter stats for './gml2svg_3_1_1 ../../data/layer/gml-10 out_3': 24,444,713,630 instructions:u#1.83 insns per cycle [83.35%] 13,344,529,298 cycles:u #0.000 GHz [83.33%] 41,915,991 branch-misses:u #0.70% of all branches [83.33%] 6,013,112,976 branches:u [83.34%] 81,290,233 L1-dcache-misses:u [83.33%] 153,198,046 L1-icache-misses:u [66.73%] 3.764054961 seconds time elapsed icXML Performance counter stats for './gml2svg_icx ../../data/layer/gml-10 out_3': 16,470,263,948 instructions:u#1.89 insns per cycle [83.33%] 8,707,613,130 cycles:u #0.000 GHz [83.33%] 13,912,341 branch-misses:u #0.43% of all branches [83.35%] 3,244,282,034 branches:u [83.33%] 67,380,609 L1-dcache-misses:u [83.33%] 32,141,837 L1-icache-misses:u [66.66%] 2.554010404 seconds time elapsed icXML experimental version with 2-thread pipeline parallelism Performance counter stats for './gml2svg_icx_pipeline ../../data/layer/gml-10 out_3': 16,544,368,151 instructions:u#1.37 insns per cycle [84.11%] 12,060,226,476 cycles:u #0.000 GHz [83.93%] 13,212,826 branch-misses:u #0.39% of all branches [83.92%] 3,357,152,226 branches:u [83.78%] 77,941,092 L1-dcache-misses:u [83.10%] 25,757,287 L1-icache-misses:u [67.36%] 2.180680680 seconds time elapsed Rob Cameron, CTO, International Characters, Inc Professor of Computing Science, Simon Fraser University http://www.international-characters.com/ http://parabix.costar.sfu.ca/ - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Re: Xerces Performance Acceleration Project: icXML
Hi, Boris. On Sun, Jan 27, 2013 at 10:18 AM, Boris Kolpackov bo...@codesynthesis.com wrote: I wanted to try icXML with CodeSynthesis XSD[1] for some time now. Just haven't been able to find the time. I have a few questions: 1. It is my understanding that icXML is interface-compatible with Xerces-C++ 3-series. Is that correct? Yes, this is correct. 2. Have you done any parallelization of the XML Schema validation engine? This is on our roadmap.We have two forms of parallelization in mind: assigning validation to separate threads (there is engineering required, but it is quite feasible with our model), and SIMD parallelization of data type and grammar validation (research required). 3. You've shown results for icXML in two configurations, single- threaded and with 2 threads. Is there any documentation that describes these extra parameters/options/etc. In other words, how would I go about specifying the number of threads? The current icXML release is single-threaded. The experimental two-thread version was proof-of-concept, we are presently redesigning to be able to use multiple pipeline stages. [1] http://www.codesynthesis.com/products/xsd/ Boris -- Boris Kolpackov, Code Synthesishttp://codesynthesis.com/~boris/blog Compiler-based ORM system for C++ http://codesynthesis.com/products/odb Open-source XML data binding for C++ http://codesynthesis.com/products/xsd XML data binding for embedded systems http://codesynthesis.com/products/xsde - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
Version 0.8 of icXML (an accelerated version of Xerces-C++) is available
I am pleased to announce the availability of public SVN access to icXML 0.8 source code with svn or trac. svn co http://parabix.costar.sfu.ca/svn/icXML/icXML-0.8 Trac browser: http://parabix.costar.sfu.ca/browser/icXML icXML is an effort to systematically accelerate the Xerces-C++ parser by restructuring it to incorporate parabix (parallel bit stream) technology, while maintaining compatibility with Xerces APIs. Our target is to offer end-to-end acceleration of a broad array of Xerces applications by 50-100%. The icXML 0.8 release is a development release intended for two audiences: Xerces-C++ developers who are potentially interested in joining the icXML project, and current Xerces-C++ users who are interested in potentially replacing their Xerces deployments with icXML. We are presently targeting icXML for 64-bit Linux platforms running on Intel/AMD architectures. The Win64 platform will be supported by icXML version 1.0 as will 32-bit versions, providing that the underlying hardware supports the SSE2 SIMD extensions.Support for the ARM architecture with Neon SIMD extensions is also planned. Robert D. Cameron, Ph.D. CTO, International Characters, Inc. - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org