RE: Enhancing parsing performance

Simon Kitching 14 Jan 2003 01:06:38 -0000

Hi,

> Turn validation off!


Unfortunately, turning validation off won't speed things up very much.

Essentially, disabling validation only *suppresses* error messages about
invalid input. The DTD or schema is still processed because it can
contain things like default attribute values or entity definitions which
should be applied even when validation is disabled.

If you are really sure that the DTD or schema doesn't contain any data
that will affect the xml document being parsed, then you can either:
(a) use a feature like:
http://xml.apache.org/xerces2-j/features.html#external-parameter-entities
or
http://xml.apache.org/xerces2-j/features.html#nonvalidating.load-dtd-grammar
or
(b) register a custom EntityResolver object with the parser, which
returns an empty DTD or schema when asked for the external one

I'm not sure which of the features listed in (a) above is the one you
want. I use approach (b) currently.

If you do need to process the DTD, then you can use a custom
EntityResolver to look up a locally cached version. I don't know if
there are already implementations for catalog lookup, etc. or if you
will have to roll your own.

Hope this helps,

Simon

On Tue, 2003-01-14 at 12:18, Brian Madigan wrote:
> DOMParser parser = new DOMParser( );
> parser.setFeature 
>             ("http://xml.org/sax/features/validation";,
> 
>             false);
> or something to that effect. If I am not mistaken,
> that should stop any dtd validation from happening.
> 
> --- Jean Georges PERRIN <[EMAIL PROTECTED]> wrote:
> > Hi,
> > 
> > Thanks for the hope message!
> > 
> > I was timing the whole method, I focused on parser
> > creation and parse time
> > now.
> > 
> > I changed my code to:
> >   public void load () {
> >     DOMParser parser;
> >     Logger log =
> >
> ThinStructureConfiguration.getInstance().getLogger();
> >     
> >     try {
> >       long start = System.currentTimeMillis();
> >       parser = new DOMParser();
> >       long stop = System.currentTimeMillis();
> >       log.finest ("Creating parser took " + (stop -
> > start) + " ms");
> >     }
> >     catch (Exception e) {
> >       log.severe ("Error: Unable to instantiate
> > parser");
> >       return;
> >     }
> > 
> >     try {
> >       long start = System.currentTimeMillis();
> >       parser.parse(m_file.toURI().toString());
> >       long stop = System.currentTimeMillis();
> >       log.finest ("Parsing of " + m_file.getName() +
> > " took " + (stop -
> > start) + " ms");
> >       m_document = parser.getDocument();
> >     }
> >     catch (SAXParseException e) {
> >       // ignore
> >     }
> >     catch (Exception e) {
> >       String msg;
> >       msg = ("Error: Parse error occurred, " +
> > e.getMessage());
> >       if (e instanceof SAXException) {
> >         e = ((SAXException)e).getException();
> >       }
> >       msg += '\n' + e.toString();
> >       log.severe (msg);
> >     }
> >   }
> > 
> > Results are:
> > Jan 13, 2003 11:52:20 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 251 ms
> > Jan 13, 2003 11:52:25 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword.xhtml took 5227 ms
> > Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add
> > INFO: Window definition emailpassword.xhtml added.
> > Jan 13, 2003 11:52:25 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 10 ms
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword2.xhtml took 3085 ms
> > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > INFO: Window definition emailpassword2.xhtml added.
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 0 ms
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword3.xhtml took 10 ms
> > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > INFO: Window definition emailpassword3.xhtml added.
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 0 ms
> > Jan 13, 2003 11:52:31 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword4.xhtml took 2774 ms
> > 
> > All files are identical, except #3 where I removed
> > all references to the
> > external world.
> > 
> > I use Xerces J 2.2.1 (according to build.xml).
> > 
> > Conclusions & questions:
> > 1/ Creation of DOMParser() is slow the first time,
> > but ridiculous
> > afterwards, so there is no need for enhancing that
> > much.
> > 2/ My parser seems to want to check the validity
> > through external
> > connection. How can I remove those without modifying
> > all my files?
> > 
> > jgp 
> > 
> > > -----Original Message-----
> > > From: Simon Kitching
> > [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, January 13, 2003 23:24
> > > To: [EMAIL PROTECTED]
> > > Cc: [EMAIL PROTECTED]
> > > Subject: Re: Enhancing parsing performance
> > > 
> > > Hi Jean Georges,
> > > 
> > > Firstly, does the document you are parsing contain
> > a DTD or schema
> > > reference? If it uses http://acme.com/xyz.dtd,
> > then much of your parsing
> > > time may actually be in retrieval of the remote
> > dtd. And if the
> > > dtd/schema is large then time will be spent
> > processing it. If this is
> > > the case, there are optimisations available for
> > both these problems.
> > > 
> > > Secondly, you don't say exactly what you are
> > timing. Is it the complete
> > > application time, or the time taken by the method
> > you include below, or
> > > just the time for the parse method?
> > > 
> > > Thirdly, you don't mention which version of Xerces
> > you are using...
> > > 
> > > Providing information on the above would allow
> > people to provide better
> > > suggestions for you..
> > > 
> > > I certainly see better performance than you do, so
> > there is hope :-)
> > > 
> > > Regards,
> > > 
> > > Simon
> > > 
> > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN
> > wrote:
> > > > Hi,
> > > >
> > > > Thanks for those who helped me with cloning...
> > > >
> > > > I am a little surprised with performance. Maybe
> > there are some basic
> > > things
> > > > I am doing wrong.
> > > >
> > > > I am parsing a 3 Kb XHTML file and it takes me
> > about 4s, cloning the
> > > tree
> > > > takes me roughly a ridiculous amount of time
> > (10ms). This on an Athlon
> > > XP
> > > > 1800+ running XP (sure I could switch to Linux
> > but it is not planned for
> > > now
> > > > :) ).
> > > >
> > > > My code for parsing:
> > > >   protected void load () {
> > > >     DOMParser parser;
> > > >
> > > >     try {
> > > >       parser = new DOMParser();
> > > >     }
> > > >     catch (Exception e) {
> > > >       log.severe ("Error: Unable to instantiate
> > parser");
> > > >       return;
> > > >     }
> > > >
> > > >     try {
> > > >       parser.parse(m_file.toURI().toString());
> > > >       m_document = parser.getDocument();
> > > >     }
> > > >     catch (SAXParseException e) {
> > > >       // ignore
> > > >     }
> > > >     catch (Exception e) {
> > > >       String msg;
> > > >       msg = ("Error: Parse error occurred, " +
> > e.getMessage());
> > > >       if (e instanceof SAXException) {
> > > >         e = ((SAXException)e).getException();
> > > >       }
> > > >       msg += '\n' + e.toString();
> > > >       log.severe (msg);
> > > >     }
> > > >   }
> > > >
> > > > Questions:
> > > > 1/ is static'ing my parser will enhance the
> > process?
> > > > 2/ can I "pre" create some objects I can reuse?
> > > > 3/ are there some eventual verification I can
> > turn 
> === message truncated ===
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Enhancing parsing performance

Reply via email to