Scintilla's use of PropSet is very simple (32 keys are inserted by SciTE) and hasn't appeared important in profiles. SciTE is heavier but the main problem is not with hashed access but with file pattern based properties which require exhaustive searches. Say the file 'hash.xx' has been opened and it goes looking for a lexer as specified in one of the lexer.<filepattern> properties lexer.*.cxx lexer.*.hh;*.xx and lexer.*.js;*.html . It can't go searching for a particular hash because the <filepattern> may contain more than one pattern. Therefore it looks at every single property checking for the prefix "lexer." and then whether the filename matches the rest of the key name. There are various improvements that could be made to the data structure to address this such as linking together all the keys with a particular prefix. While the time taken to do these searches is noticeable in profiles it has never been slow enough to look at fixing.
Robert:
2) I packaged it in a very contained way - just replacing the previous [inline] definition of PropSet::HashString in propset.h
About the only negative I have is that the code is longer and more complex. For SinkWorld I used Python's string hash which is 7 line of code.
3) The legalities are discussed in some length by the author at http://www.azillionmonkeys.com/qed/weblicense.html By stripping the already quite sparse comments, I believe the borrowed "raw source code" qualifies under the author's "derivative" license, and Scintilla's own licensing terms do not invalidate this.
Interpreting the conditions is fun but it looks OK.
Properly balancing size and performance is the trick here - since PropSet gets used in a one-size-fits-all way (with large sets but with space still mattering), I have used 256 buckets as a compromise. Going larger just gets better (my own heavily instrumented use of this algorithm was with 4096 buckets), so upping this might make sense. This is really up to Neil, who *may* want to make it a configurable option, so that say, SciTE could use a larger number like 4096, while more constrained uses of Scintilla would use 256 (or even 32).
Where use level is uncertain, I like hash tables that can grow like SinkWorld's Dictionary which doubles in size once it is 2/3 full. Scintilla only uses simple key names (no filepatterns) and integer values so could actually use a very simple implementation, moving the full featured hash table into SciTE. I wouldn't mind decreasing the 'helper library' part of Scintilla (SString, PropSet, WordList, ...) that are statically linked into client code since it complicates using Scintilla. Neil _______________________________________________ Scite-interest mailing list [email protected] http://mailman.lyra.org/mailman/listinfo/scite-interest
