Thanks. That is helpful. -----Original Message----- From: Tom Bradford [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 06, 2002 7:01 PM To: [email protected] Subject: Re: indexing/xpath query question
On Wednesday, March 6, 2002, at 04:49 PM, Mark J. Stang wrote: > Sounds like a good sample to me. The collection could be smaller if > you > document is mostly tags. I would guess that the internal storage is > not > just your raw document, but a parsed version and the tags are probably > represented by a number. If the ratio of your data to your > tags goes way up, then you will probably see a difference. I don't > know > this for fact, I don't actually code Xindice, I just play a coder on > television. This is pretty much the case. Xindice doesn't store things as a serialized DOM. It creates a tokenized stream, and stores all element and attribute names in a single, global collection that maps those names to integer symbol IDs. The symbol IDs are what actually get stored in the collection and index files, so if the XML is very data oriented and has a lot of tags and attributes, the removal of those names can reduce the size of the disk image rather well. -- Tom Bradford - http://www.tbradford.org Architect - XQRL (XQuery Engine) - http://www.xqrl.com Apache Xindice (Native XML Database) - http://xml.apache.org/xindice Project Labrador (Web Services Framework) - http://notdotnet.org
