On Wednesday, May 8, 2002, at 01:04 PM, [EMAIL PROTECTED] wrote:
We have a collection of about 800 documents each about 5 Kb in size. Upon indexing using the wildcard index, Xindice retrieves queries' using the Contains function in about 45-60 seconds, which is unacceptable by Internet standards. Is there a way to improve on this performance? We have tried indexing all elements, all attributes as well as specific elements only, with no success. Also, is there a way to delete an index (other than to delete and then recreate the collection)? We know from experience that Oracle retrieved similar queries using the Contains clause with 100 times more XML documents stored in CLOBS in less than 15 seconds using Oracle Intermedia.
Even if Xindice supported full text indexing, which it doesn't yet, contains() queries will *always* result in a collection scan because contains() is not a full text function, but rather, a substring function which requires all text to be scanned. Xindice is *not* a search engine, and though at some point we'll support searches on semi-structured data, for now you're better off just using grep if all you're doing is contains searches.
-- Tom Bradford - http://www.tbradford.org Architect - XQRL (XQuery Engine) - http://www.xqrl.com Apache Xindice (XML Database) - http://xml.apache.org/xindice Labrador (Web Services Hub) - http://www.notdotnet.org/labrador