For string parsing a good approximation to best parsing can be provided by using a hash table. You will only have to do comparisons then in the case of collisons. Since your tags are fixed, careful choice of a hash function and hash table size will mean that they at least don't coincide. If you can guarantee that you only ever see these tags (e.g. you have validated the document against a DTD), then you can construct a perfect hash function, which will remove all collisions - GNU gperf is good for this.
If you want to go one better then you can construct a DFA to recognise the strings; DFAs can be generated by lex/flex (which you will have to hack on quite a lot to support the unicode chars), or if you are feeling brave :-) you can construct them by hand. Basically DFAs cut out comparison and reduce everything to table lookups (speed gained at the expense of code bloat). I got a good speed increase in a text parser of ~3X by recoding it as a DFA. However - before you go overboard with this, are you sure that your parsing speed problems still lie in the startElement handler/string recognition? One easy way to tell: remove all "do somethings" and time. Remove all code and time. Compare the two. If there is only a small percentage difference then you are optimising the wrong code! Mark > -----Original Message----- > From: Dee Jay Randall [mailto:[EMAIL PROTECTED]] > Sent: 04 October 2001 00:54 > To: [EMAIL PROTECTED] > Subject: SAX performance observations > > > > Two observations on performance implementing a class derived > from DefaultHandler: > > > 1. If you use a large conditional for handling different tags, you > should do the comparisons in the order of most likely to be seen. > > element distribution: Tag1 (10%); Tag2 (70%); Tag3 (20%) > > Then: > > void startElement(...) > { > if ( DOMString("Tag2").equals(localname) ) > ; // do something > else if ( DOMString("Tag3").equals(localname) ) > ; // do something > else if ( DOMString("Tag1").equals(localname) ) > ; // do something > } > > In my application (with 18 tags), this made about a 3 X performance > increase. > > Also, it should save string comparisons if all the tags were distinct > by their first character, but I don't seriously recommend this. > > > > > 2. It appears that it is very expensive to do a DOMString("MyTag"). > Basically don't ever do this: > > void startElement(...) > { > if ( DOMString("MyTag").equals(localname) ) > ; // do something > } > > > Instead, do this: > > void startElement(...) > { > static const DOMString cDOMStringMyTag("MyTag"); > > if ( cDOMStringMyTag.equals(localname) ) > ; // do something > } > > In my application this caused about a 7 X performance increase on > top of the previous 3 X increase. I went from reading 100K elements > in 240 seconds to about 11 seconds. > > > > Can anyone offer any comments on further accelerating my parsing? > > > Thanks, > Dee Jay > > +-----------------------------+------------------+-----------------------+ > | Founding Partner | Software Engineer| Dee Jay Randall, B.Sc.| > | Circular Reasoning | Accrue Software | M.Sc. Student, CS | > | [EMAIL PROTECTED]| www.accrue.com | ICQ # 43551676 | > +-----------------------------+------------------+-----------------------+ > What is the average rank of every song ever written? 42 -- www.launch.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
