Thanks Simon, Do you have a code fragment that would illustrates your (b) approach?
jgp > -----Original Message----- > From: Simon Kitching [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 14, 2003 02:03 > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: RE: Enhancing parsing performance > > Hi, > > > Turn validation off! > > Unfortunately, turning validation off won't speed things up very much. > > Essentially, disabling validation only *suppresses* error messages about > invalid input. The DTD or schema is still processed because it can > contain things like default attribute values or entity definitions which > should be applied even when validation is disabled. > > If you are really sure that the DTD or schema doesn't contain any data > that will affect the xml document being parsed, then you can either: > (a) use a feature like: > http://xml.apache.org/xerces2-j/features.html#external-parameter-entities > or > http://xml.apache.org/xerces2-j/features.html#nonvalidating.load-dtd- > grammar > or > (b) register a custom EntityResolver object with the parser, which > returns an empty DTD or schema when asked for the external one > > I'm not sure which of the features listed in (a) above is the one you > want. I use approach (b) currently. > > If you do need to process the DTD, then you can use a custom > EntityResolver to look up a locally cached version. I don't know if > there are already implementations for catalog lookup, etc. or if you > will have to roll your own. > > Hope this helps, > > Simon > > On Tue, 2003-01-14 at 12:18, Brian Madigan wrote: > > DOMParser parser = new DOMParser( ); > > parser.setFeature > > ("http://xml.org/sax/features/validation", > > > > false); > > or something to that effect. If I am not mistaken, > > that should stop any dtd validation from happening. > > > > --- Jean Georges PERRIN <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > Thanks for the hope message! > > > > > > I was timing the whole method, I focused on parser > > > creation and parse time > > > now. > > > > > > I changed my code to: > > > public void load () { > > > DOMParser parser; > > > Logger log = > > > > > ThinStructureConfiguration.getInstance().getLogger(); > > > > > > try { > > > long start = System.currentTimeMillis(); > > > parser = new DOMParser(); > > > long stop = System.currentTimeMillis(); > > > log.finest ("Creating parser took " + (stop - > > > start) + " ms"); > > > } > > > catch (Exception e) { > > > log.severe ("Error: Unable to instantiate > > > parser"); > > > return; > > > } > > > > > > try { > > > long start = System.currentTimeMillis(); > > > parser.parse(m_file.toURI().toString()); > > > long stop = System.currentTimeMillis(); > > > log.finest ("Parsing of " + m_file.getName() + > > > " took " + (stop - > > > start) + " ms"); > > > m_document = parser.getDocument(); > > > } > > > catch (SAXParseException e) { > > > // ignore > > > } > > > catch (Exception e) { > > > String msg; > > > msg = ("Error: Parse error occurred, " + > > > e.getMessage()); > > > if (e instanceof SAXException) { > > > e = ((SAXException)e).getException(); > > > } > > > msg += '\n' + e.toString(); > > > log.severe (msg); > > > } > > > } > > > > > > Results are: > > > Jan 13, 2003 11:52:20 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Creating parser took 251 ms > > > Jan 13, 2003 11:52:25 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Parsing of emailpassword.xhtml took 5227 ms > > > Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add > > > INFO: Window definition emailpassword.xhtml added. > > > Jan 13, 2003 11:52:25 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Creating parser took 10 ms > > > Jan 13, 2003 11:52:29 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Parsing of emailpassword2.xhtml took 3085 ms > > > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add > > > INFO: Window definition emailpassword2.xhtml added. > > > Jan 13, 2003 11:52:29 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Creating parser took 0 ms > > > Jan 13, 2003 11:52:29 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Parsing of emailpassword3.xhtml took 10 ms > > > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add > > > INFO: Window definition emailpassword3.xhtml added. > > > Jan 13, 2003 11:52:29 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Creating parser took 0 ms > > > Jan 13, 2003 11:52:31 PM > > > com.awoma.ts.ui.impl.XHTML11Window load > > > FINEST: Parsing of emailpassword4.xhtml took 2774 ms > > > > > > All files are identical, except #3 where I removed > > > all references to the > > > external world. > > > > > > I use Xerces J 2.2.1 (according to build.xml). > > > > > > Conclusions & questions: > > > 1/ Creation of DOMParser() is slow the first time, > > > but ridiculous > > > afterwards, so there is no need for enhancing that > > > much. > > > 2/ My parser seems to want to check the validity > > > through external > > > connection. How can I remove those without modifying > > > all my files? > > > > > > jgp > > > > > > > -----Original Message----- > > > > From: Simon Kitching > > > [mailto:[EMAIL PROTECTED] > > > > Sent: Monday, January 13, 2003 23:24 > > > > To: [EMAIL PROTECTED] > > > > Cc: [EMAIL PROTECTED] > > > > Subject: Re: Enhancing parsing performance > > > > > > > > Hi Jean Georges, > > > > > > > > Firstly, does the document you are parsing contain > > > a DTD or schema > > > > reference? If it uses http://acme.com/xyz.dtd, > > > then much of your parsing > > > > time may actually be in retrieval of the remote > > > dtd. And if the > > > > dtd/schema is large then time will be spent > > > processing it. If this is > > > > the case, there are optimisations available for > > > both these problems. > > > > > > > > Secondly, you don't say exactly what you are > > > timing. Is it the complete > > > > application time, or the time taken by the method > > > you include below, or > > > > just the time for the parse method? > > > > > > > > Thirdly, you don't mention which version of Xerces > > > you are using... > > > > > > > > Providing information on the above would allow > > > people to provide better > > > > suggestions for you.. > > > > > > > > I certainly see better performance than you do, so > > > there is hope :-) > > > > > > > > Regards, > > > > > > > > Simon > > > > > > > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN > > > wrote: > > > > > Hi, > > > > > > > > > > Thanks for those who helped me with cloning... > > > > > > > > > > I am a little surprised with performance. Maybe > > > there are some basic > > > > things > > > > > I am doing wrong. > > > > > > > > > > I am parsing a 3 Kb XHTML file and it takes me > > > about 4s, cloning the > > > > tree > > > > > takes me roughly a ridiculous amount of time > > > (10ms). This on an Athlon > > > > XP > > > > > 1800+ running XP (sure I could switch to Linux > > > but it is not planned for > > > > now > > > > > :) ). > > > > > > > > > > My code for parsing: > > > > > protected void load () { > > > > > DOMParser parser; > > > > > > > > > > try { > > > > > parser = new DOMParser(); > > > > > } > > > > > catch (Exception e) { > > > > > log.severe ("Error: Unable to instantiate > > > parser"); > > > > > return; > > > > > } > > > > > > > > > > try { > > > > > parser.parse(m_file.toURI().toString()); > > > > > m_document = parser.getDocument(); > > > > > } > > > > > catch (SAXParseException e) { > > > > > // ignore > > > > > } > > > > > catch (Exception e) { > > > > > String msg; > > > > > msg = ("Error: Parse error occurred, " + > > > e.getMessage()); > > > > > if (e instanceof SAXException) { > > > > > e = ((SAXException)e).getException(); > > > > > } > > > > > msg += '\n' + e.toString(); > > > > > log.severe (msg); > > > > > } > > > > > } > > > > > > > > > > Questions: > > > > > 1/ is static'ing my parser will enhance the > > > process? > > > > > 2/ can I "pre" create some objects I can reuse? > > > > > 3/ are there some eventual verification I can > > > turn > > === message truncated === > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > > http://mailplus.yahoo.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
