RE: Enhancing parsing performance

Jean Georges PERRIN 14 Jan 2003 06:43:46 -0000

Thanks Simon,

Do you have a code fragment that would illustrates your (b) approach?


jgp 

> -----Original Message-----
> From: Simon Kitching [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 14, 2003 02:03
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: RE: Enhancing parsing performance
> 
> Hi,
> 
> > Turn validation off!
> 
> Unfortunately, turning validation off won't speed things up very much.
> 
> Essentially, disabling validation only *suppresses* error messages about
> invalid input. The DTD or schema is still processed because it can
> contain things like default attribute values or entity definitions which
> should be applied even when validation is disabled.
> 
> If you are really sure that the DTD or schema doesn't contain any data
> that will affect the xml document being parsed, then you can either:
> (a) use a feature like:
> http://xml.apache.org/xerces2-j/features.html#external-parameter-entities
> or
> http://xml.apache.org/xerces2-j/features.html#nonvalidating.load-dtd-
> grammar
> or
> (b) register a custom EntityResolver object with the parser, which
> returns an empty DTD or schema when asked for the external one
> 
> I'm not sure which of the features listed in (a) above is the one you
> want. I use approach (b) currently.
> 
> If you do need to process the DTD, then you can use a custom
> EntityResolver to look up a locally cached version. I don't know if
> there are already implementations for catalog lookup, etc. or if you
> will have to roll your own.
> 
> Hope this helps,
> 
> Simon
> 
> On Tue, 2003-01-14 at 12:18, Brian Madigan wrote:
> > DOMParser parser = new DOMParser( );
> > parser.setFeature
> >             ("http://xml.org/sax/features/validation";,
> >
> >             false);
> > or something to that effect. If I am not mistaken,
> > that should stop any dtd validation from happening.
> >
> > --- Jean Georges PERRIN <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > Thanks for the hope message!
> > >
> > > I was timing the whole method, I focused on parser
> > > creation and parse time
> > > now.
> > >
> > > I changed my code to:
> > >   public void load () {
> > >     DOMParser parser;
> > >     Logger log =
> > >
> > ThinStructureConfiguration.getInstance().getLogger();
> > >
> > >     try {
> > >       long start = System.currentTimeMillis();
> > >       parser = new DOMParser();
> > >       long stop = System.currentTimeMillis();
> > >       log.finest ("Creating parser took " + (stop -
> > > start) + " ms");
> > >     }
> > >     catch (Exception e) {
> > >       log.severe ("Error: Unable to instantiate
> > > parser");
> > >       return;
> > >     }
> > >
> > >     try {
> > >       long start = System.currentTimeMillis();
> > >       parser.parse(m_file.toURI().toString());
> > >       long stop = System.currentTimeMillis();
> > >       log.finest ("Parsing of " + m_file.getName() +
> > > " took " + (stop -
> > > start) + " ms");
> > >       m_document = parser.getDocument();
> > >     }
> > >     catch (SAXParseException e) {
> > >       // ignore
> > >     }
> > >     catch (Exception e) {
> > >       String msg;
> > >       msg = ("Error: Parse error occurred, " +
> > > e.getMessage());
> > >       if (e instanceof SAXException) {
> > >         e = ((SAXException)e).getException();
> > >       }
> > >       msg += '\n' + e.toString();
> > >       log.severe (msg);
> > >     }
> > >   }
> > >
> > > Results are:
> > > Jan 13, 2003 11:52:20 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 251 ms
> > > Jan 13, 2003 11:52:25 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword.xhtml took 5227 ms
> > > Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add
> > > INFO: Window definition emailpassword.xhtml added.
> > > Jan 13, 2003 11:52:25 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 10 ms
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword2.xhtml took 3085 ms
> > > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > > INFO: Window definition emailpassword2.xhtml added.
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 0 ms
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword3.xhtml took 10 ms
> > > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > > INFO: Window definition emailpassword3.xhtml added.
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 0 ms
> > > Jan 13, 2003 11:52:31 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword4.xhtml took 2774 ms
> > >
> > > All files are identical, except #3 where I removed
> > > all references to the
> > > external world.
> > >
> > > I use Xerces J 2.2.1 (according to build.xml).
> > >
> > > Conclusions & questions:
> > > 1/ Creation of DOMParser() is slow the first time,
> > > but ridiculous
> > > afterwards, so there is no need for enhancing that
> > > much.
> > > 2/ My parser seems to want to check the validity
> > > through external
> > > connection. How can I remove those without modifying
> > > all my files?
> > >
> > > jgp
> > >
> > > > -----Original Message-----
> > > > From: Simon Kitching
> > > [mailto:[EMAIL PROTECTED]
> > > > Sent: Monday, January 13, 2003 23:24
> > > > To: [EMAIL PROTECTED]
> > > > Cc: [EMAIL PROTECTED]
> > > > Subject: Re: Enhancing parsing performance
> > > >
> > > > Hi Jean Georges,
> > > >
> > > > Firstly, does the document you are parsing contain
> > > a DTD or schema
> > > > reference? If it uses http://acme.com/xyz.dtd,
> > > then much of your parsing
> > > > time may actually be in retrieval of the remote
> > > dtd. And if the
> > > > dtd/schema is large then time will be spent
> > > processing it. If this is
> > > > the case, there are optimisations available for
> > > both these problems.
> > > >
> > > > Secondly, you don't say exactly what you are
> > > timing. Is it the complete
> > > > application time, or the time taken by the method
> > > you include below, or
> > > > just the time for the parse method?
> > > >
> > > > Thirdly, you don't mention which version of Xerces
> > > you are using...
> > > >
> > > > Providing information on the above would allow
> > > people to provide better
> > > > suggestions for you..
> > > >
> > > > I certainly see better performance than you do, so
> > > there is hope :-)
> > > >
> > > > Regards,
> > > >
> > > > Simon
> > > >
> > > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN
> > > wrote:
> > > > > Hi,
> > > > >
> > > > > Thanks for those who helped me with cloning...
> > > > >
> > > > > I am a little surprised with performance. Maybe
> > > there are some basic
> > > > things
> > > > > I am doing wrong.
> > > > >
> > > > > I am parsing a 3 Kb XHTML file and it takes me
> > > about 4s, cloning the
> > > > tree
> > > > > takes me roughly a ridiculous amount of time
> > > (10ms). This on an Athlon
> > > > XP
> > > > > 1800+ running XP (sure I could switch to Linux
> > > but it is not planned for
> > > > now
> > > > > :) ).
> > > > >
> > > > > My code for parsing:
> > > > >   protected void load () {
> > > > >     DOMParser parser;
> > > > >
> > > > >     try {
> > > > >       parser = new DOMParser();
> > > > >     }
> > > > >     catch (Exception e) {
> > > > >       log.severe ("Error: Unable to instantiate
> > > parser");
> > > > >       return;
> > > > >     }
> > > > >
> > > > >     try {
> > > > >       parser.parse(m_file.toURI().toString());
> > > > >       m_document = parser.getDocument();
> > > > >     }
> > > > >     catch (SAXParseException e) {
> > > > >       // ignore
> > > > >     }
> > > > >     catch (Exception e) {
> > > > >       String msg;
> > > > >       msg = ("Error: Parse error occurred, " +
> > > e.getMessage());
> > > > >       if (e instanceof SAXException) {
> > > > >         e = ((SAXException)e).getException();
> > > > >       }
> > > > >       msg += '\n' + e.toString();
> > > > >       log.severe (msg);
> > > > >     }
> > > > >   }
> > > > >
> > > > > Questions:
> > > > > 1/ is static'ing my parser will enhance the
> > > process?
> > > > > 2/ can I "pre" create some objects I can reuse?
> > > > > 3/ are there some eventual verification I can
> > > turn
> > === message truncated ===
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Enhancing parsing performance

Reply via email to