[ 
https://issues.apache.org/jira/browse/GEODE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106199#comment-16106199
 ] 

ASF GitHub Bot commented on GEODE-3306:
---------------------------------------

GitHub user darrenfoong opened a pull request:

    https://github.com/apache/geode/pull/668

    GEODE-3306: Remove whitespace StringBuffers/nodes created by Apache X…

    …erces
    
    This commit makes Geode compatible with the official Apache Xerces
    implementation, which calls `characters()` when it reads ignorable
    whitespace in `cache.xml`.
    
    The while loop is required to handle comments in `cache.xml`, i.e.
    a comment with whitespace before and after will generate two
    empty StringBuffers (one for each set of whitespace before and after)
    on the parse stack. The while loop removes all "consecutive" whitespace
    StringBuffers from the top of the stack.
    
    ---
    
    Tested with 
https://github.com/darrenfoong/geode-parser-poc/blob/master/src/test/java/server/ServerTest.java
    
    ---
    
    Thank you for submitting a contribution to Apache Geode.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
    
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    - [x] Does `gradlew build` run cleanly?
    
    - [ ] Have you written or updated unit tests to verify your changes?
    
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and
    submit an update to your PR as soon as possible. If you need help, please 
send an
    email to d...@geode.apache.org.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/darrenfoong/geode df-GEODE-3306

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/geode/pull/668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #668
    
----
commit d742c9bb2dc672be4ec01a98423989795748f0d8
Author: Darren Foong <darrenfo...@gmail.com>
Date:   2017-07-29T17:52:37Z

    GEODE-3306: Remove whitespace StringBuffers/nodes created by Apache Xerces
    
    This commit makes Geode compatible with the official Apache Xerces
    implementation, which calls `characters()` when it reads ignorable
    whitespace in `cache.xml`.
    
    The while loop is required to handle comments in `cache.xml`, i.e.
    a comment with whitespace before and after will generate two
    empty StringBuffers (one for each set of whitespace before and after)
    on the parse stack. The while loop removes all "consecutive" whitespace
    StringBuffers from the top of the stack.

----


> Parsing of cache.xml with whitespace fails with Apache Xerces
> -------------------------------------------------------------
>
>                 Key: GEODE-3306
>                 URL: https://issues.apache.org/jira/browse/GEODE-3306
>             Project: Geode
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.2.0
>            Reporter: Darren Foong
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> I am using Geode 1.2.0 and Apache Xerces 2.11.0 (not the one included in the 
> Oracle JDK), and I encountered the following error when I tried to 
> programmatically start a cache:
> {noformat}
> org.apache.geode.InternalGemFireError: Did not expected a 
> java.lang.StringBuffer on top of the stack.
> Exception in thread "main" org.apache.geode.InternalGemFireError: Did not 
> expected a java.lang.StringBuffer on top of the stack.
>       at org.apache.geode.internal.Assert.throwError(Assert.java:94)
>       at org.apache.geode.internal.Assert.assertTrue(Assert.java:117)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.endRegionAttributes(CacheXmlParser.java:1257)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.endElement(CacheXmlParser.java:2909)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser$DefaultHandlerDelegate.endElement(CacheXmlParser.java:3374)
>       at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown 
> Source)
>       at 
> org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown 
> Source)
>       at org.apache.xerces.impl.xs.XMLSchemaValidator.emptyElement(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown 
> Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>  Source)
>       at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
>       at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
>       at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.parse(CacheXmlParser.java:224)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4287)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1390)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1195)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:758)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:745)
>       at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:173)
>       at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:212)
>       at server.ServerWhitespace.main(ServerWhitespace.java:8)
> {noformat}
> However, this does not happen when I don't use Apache Xerces, i.e. I rely on 
> the version in the Oracle JDK (1.8).
> After getting the Geode source code and stepping through the parsing using 
> the Eclipse debugger, I realised that there were unexpected StringBuffers 
> pushed onto the parse stack, thus causing the problem.
> These StringBuffers were created and pushed by the {{characters()}} method 
> (https://github.com/apache/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/xmlcache/CacheXmlParser.java#L3270).
>  Changing the log level to {{TRACE}} and examining the parse stack showed 
> that these StringBuffers contained the whitespace (including newlines) 
> between the XML tags in {{cache.xml}}.
> When using the Oracle JDK's version of Xerces, these StringBuffers did not 
> appear on the parse stack despite the whitespace.
> I have a proof of concept on GitHub: 
> https://github.com/darrenfoong/geode-parser-poc The {{cache.xml}} file 
> without whitespace between the tags was parsed without errors by both 
> versions of Xerces.
> It could be the case that the JDK Xerces strips out whitespace while Apache 
> Xerces doesn't; but this could be implemented in {{characters()}} by only 
> pushing non-whitespace char arrays in the {{else}} block. However, there 
> could be other XML parsing edge cases that I am unaware of.
> There should be others who need Apache Xerces for their projects; a fix would 
> be appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to