[Zope] performance of textindexng2 vs. zctextindex

2005-07-19 Thread Francis Kelly
I recently installed TextIndexNG2 2.1.1 on a system running Zope 2.7.6 
on Fedora Core 3. I've been running some comparison tests with 
ZCTextIndex, which is what our site currently uses. We're indexing 
around 50,000 objects at the moment. For TextIndexNG2, this is the 
configuration:


Indexed attributes   keywordSearchSource
Default encodingutf-8
Storage StandardStorage
Stemmer english
Splitter: casefolding   enabled
Splitter: index single characters   disabled
Splitter: max. length of splitted words 64
Splitter: separator characters  .+-_@
Default query parserPyQueryParser
Autoexpansion   disabled
Stopwords   english
Normalizer  European
Use converters  disabled
Near distance   
Left truncation disabled



I've been struck that if the number of search hits is high, TextIndexNG2 
is much slower than ZCTextIndex. For example, if I do a search on 
'podcast' (our site deals w/ podcasting) I get about 14,000 hits. 
ZCTextIndex returns the results in about 0.1 seconds; TextIndexNG2 takes 
31 seconds or 300 times longer. In general, the more hits there are, the 
bigger the difference between the two search indexes.


TextIndexNG2 is great: it has many features that we really want and 
perhaps the cost of those features is performance vis-a-vis ZCTextIndex. 
But I'm hoping that maybe I've overlooked an obvious or not-so-obvious 
configuration issue that will enable me to speed up TextIndexNG2.


Thanks for any advice.

Francis Kelly
www.loomia.com



___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] performance of textindexng2 vs. zctextindex

2005-07-19 Thread Andreas Jung



--On 19. Juli 2005 17:15:25 -0700 Francis Kelly [EMAIL PROTECTED] wrote:


I recently installed TextIndexNG2 2.1.1


which is *pretty old*. Take a look at the v 2.2.0 which has been optimized 
over the time in different ways. Consider using StupidStorage as documented 
in the release notes.




I've been struck that if the number of search hits is high, TextIndexNG2
is much slower than ZCTextIndex. For example, if I do a search on
'podcast' (our site deals w/ podcasting) I get about 14,000 hits.
ZCTextIndex returns the results in about 0.1 seconds; TextIndexNG2 takes
31 seconds or 300 times longer. In general, the more hits there are, the
bigger the difference between the two search indexes.


Query speed depends on different things: the query, the implementation, the 
operations needed to be performed during the query. Because of some 
functionality TXNG needs to store much more information than ZCTextIndex.

It did this as said above sometimes in a not so efficient way (see above).
You might also look at TextIndexNG V3.

-aj


pgpFYdmOCOO1D.pgp
Description: PGP signature
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )