RE: SAX performance observations

Mark Weaver Wed, 03 Oct 2001 16:30:47 -0700

For string parsing a good approximation to best parsing can be provided by
using a hash table.  You will only have to do comparisons then in the case
of collisons.  Since your tags are fixed, careful choice of a hash function
and hash table size will mean that they at least don't coincide.  If you can
guarantee that you only ever see these tags (e.g. you have validated the
document against a DTD), then you can construct a perfect hash function,
which will remove all collisions - GNU gperf is good for this.


If you want to go one better then you can construct a DFA to recognise the
strings; DFAs can be generated by lex/flex (which you will have to hack on
quite a lot to support the unicode chars), or if you are feeling brave :-)
you can construct them by hand.  Basically DFAs cut out comparison and
reduce everything to table lookups (speed gained at the expense of code
bloat).  I got a good speed increase in a text parser of ~3X by recoding it
as a DFA.

However - before you go overboard with this, are you sure that your parsing
speed problems still lie in the startElement handler/string recognition?
One easy way to tell: remove all "do somethings" and time.  Remove all code
and time.  Compare the two.  If there is only a small percentage difference
then you are optimising the wrong code!

Mark

> -----Original Message-----
> From: Dee Jay Randall [mailto:[EMAIL PROTECTED]]
> Sent: 04 October 2001 00:54
> To: [EMAIL PROTECTED]
> Subject: SAX performance observations
>
>
>
>   Two observations on performance implementing a class derived
> from DefaultHandler:
>
>
> 1. If you use a large conditional for handling different tags, you
> should do the comparisons in the order of most likely to be seen.
>
> element distribution: Tag1 (10%); Tag2 (70%); Tag3 (20%)
>
> Then:
>
> void startElement(...)
> {
>   if ( DOMString("Tag2").equals(localname) )
>     ; // do something
>   else if ( DOMString("Tag3").equals(localname) )
>     ; // do something
>   else if ( DOMString("Tag1").equals(localname) )
>     ; // do something
> }
>
>   In my application (with 18 tags), this made about a 3 X performance
> increase.
>
>   Also, it should save string comparisons if all the tags were distinct
> by their first character, but I don't seriously recommend this.
>
>
>
>
> 2. It appears that it is very expensive to do a DOMString("MyTag").
> Basically don't ever do this:
>
> void startElement(...)
> {
>   if ( DOMString("MyTag").equals(localname) )
>     ; // do something
> }
>
>
> Instead, do this:
>
> void startElement(...)
> {
>   static const DOMString cDOMStringMyTag("MyTag");
>
>   if ( cDOMStringMyTag.equals(localname) )
>     ; // do something
> }
>
>   In my application this caused about a 7 X performance increase on
> top of the previous 3 X increase. I went from reading 100K elements
> in 240 seconds to about 11 seconds.
>
>
>
>   Can anyone offer any comments on further accelerating my parsing?
>
>
>   Thanks,
>   Dee Jay
>
> +-----------------------------+------------------+-----------------------+
> | Founding Partner            | Software Engineer| Dee Jay Randall, B.Sc.|
> | Circular Reasoning          | Accrue Software  | M.Sc. Student, CS     |
> | [EMAIL PROTECTED]| www.accrue.com   | ICQ # 43551676        |
> +-----------------------------+------------------+-----------------------+
> What is the average rank of every song ever written? 42  -- www.launch.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: SAX performance observations

Reply via email to