jbates      2002/12/03 06:21:20

  Modified:    src/documentation/content/xdocs/dev guide-internals.xml
  Added:       src/documentation/resources/images element.png element.xcf
  Log:
  Started Compressed DOM chapter in Internals Guide
  
  Revision  Changes    Path
  1.3       +96 -1     
xml-xindice/src/documentation/content/xdocs/dev/guide-internals.xml
  
  Index: guide-internals.xml
  ===================================================================
  RCS file: 
/home/cvs/xml-xindice/src/documentation/content/xdocs/dev/guide-internals.xml,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- guide-internals.xml       26 Nov 2002 09:20:42 -0000      1.2
  +++ guide-internals.xml       3 Dec 2002 14:21:20 -0000       1.3
  @@ -286,6 +286,12 @@
                      further, the start of the page's data.</p>
                   <section>
                       <title>3.1.1. Paged file header</title>
  +                    <p>The paged file header consists of a number of 
fixed-length fields.
  +                       Fields which are longer than one byte, are 
<em>always</em> stored in
  +                       Big Endian format, which means the most significant 
byte is written at the
  +                       lowest address. This is regardless of the type of 
architecture the server
  +                       process is running on, so your data files are 
portable between
  +                       architectures.</p>
                       <figure src="/images/pagedfilehdr.png" alt="File header 
structure"/>
                       <p>The meaning of the various fields in the file header, 
whose structure
                          is shown above, is as follows:</p>
  @@ -349,11 +355,100 @@
           </section>
           <section>
               <title>4. XML storage</title>
  +            <p>As we saw in the preceding chapter, the B+-Tree file format 
allows for the
  +               efficient storage of (name, value) pairs. In this chapter we 
concern ourselves
  +               with using such a (name,value) storage facility to store the 
XML content of all
  +               XML documents in a collection.</p>
  +            <p>The principle Xindice uses is deceptively simple here: for 
every XML <em>document</em>,
  +               Xindice will calculate something called the <em>compressed 
DOM</em>. This is an array of bytes
  +               which can be used to reconstruct the complete XML document at 
any time. An XML document is
  +               then stored as a (name,value) pair in the B-Tree, where the 
name is the name given to the XML document,
  +               and the value is the calculated Compressed DOM.</p>
  +            <p>The remaining mechanism to investigate is thus how to 
construct the Compressed DOM
  +               of a document.</p>
               <section>
                   <title>4.1. The symbol tables</title>
  +                <p>In order to store the XML content in a space-efficient 
manner, Xindice uses
  +                   something called a <em>Symbol table</em>. This is an XML 
file which associates
  +                   a 16-bit number with any (QName,namespace URI) pair used 
as element or attribute name
  +                   in XML <em>all</em> XML files stored in a collection. 
(i.e. there is <em>one</em>
  +                   symbol table per collection).</p>
  +                <p>Consider the following XML document, to be added to a 
Xindice collection:</p>
  +<source><![CDATA[
  +<?xml version="1.0"?>
  +<p:person xmlns:p="http://www.xindice.org/Examples/PersonData";
  +          gender="female"
  +          xml:lang="fr">
  +    <p:first-name>Susanne</p:first-name>
  +    <p:last-name>Carpentier</p:last-name>
  +    <p:e-mail active="yes">[EMAIL PROTECTED]</p:e-mail>
  +</p:person>
  +]]></source>
  +                <p>When this document is stored into an empty Xindice 
collection, the following
  +                   symbol table is created:</p>
  +<source><![CDATA[
  +<?xml version="1.0"?>
  +<?xindice-class org.apache.xindice.xml.SymbolTable?>
  +<symbols>
  +    <symbol name="p:first-name" 
nsuri="http://www.xindice.org/Examples/PersonData"; id="4" />
  +    <symbol name="p:e-mail" 
nsuri="http://www.xindice.org/Examples/PersonData"; id="6" />
  +    <symbol name="p:last-name" 
nsuri="http://www.xindice.org/Examples/PersonData"; id="5" />
  +    <symbol name="gender" id="2" />
  +    <symbol name="xml:lang" id="3" />
  +    <symbol name="p:person" 
nsuri="http://www.xindice.org/Examples/PersonData"; id="0" />
  +    <symbol name="active" id="7" />
  +    <symbol name="xmlns:p" nsuri="http://www.w3.org/2000/xmlns/"; id="1" />
  +</symbols>
  +]]></source>
  +                <p>As you can see, the symbol table is itself an XML 
document which contains
  +                   an element for every (QName, namespace URI) pair used in 
element and attribute
  +                   names in the XML documents of the collection. The 
<code>id</code> attribute is
  +                   the 16-bit number that Xindice has assigned to the 
(QName, namespace URI) pair.</p>
  +                <p>As more documents are added to the
  +                   collection using different element and attribute names, 
entries are added to the
  +                   collection's symbol table.</p>
  +                <p>Usually, a collections's symbol table is stored as any 
other XML document in
  +                   the Xindice database. All symbol tables stored in Xindice 
are in the
  +                   <code>system/SysSymbols</code> collection using as name 
the path of the collection,
  +                   with underscores (_) subsituted for the /'s in the 
collection path.</p>
  +                <p>Being a collection in Xindice, 
<code>system/SysSymbols</code> itself has
  +                   a symbol table too. It is:</p>
  +<source><![CDATA[
  +<symbols>
  +    <symbol name="symbols" id="0" />
  +    <symbol name="symbol" id="1" />
  +    <symbol name="name" id="2" />
  +    <symbol name="id" id="3" />
  +    <symbol name="nsuri" id="4" />
  +</symbols>
  +]]></source>
  +                <p>Normally, this symbol table should be stored in an XML 
document named
  +                   <code>system_SysSymbols</code> in the 
<code>system/SysSymbols</code>
  +                   collection. Doing so however would create an endless 
loop, as
  +                   <code>system/SysSymbols</code>'s symbol table is needed 
to read itself!
  +                   This particular symbol table is therefore hardcoded into 
the Xindice
  +                   source code.</p>
  +                <p>For any other collection, you can always request the 
symbol table
  +                   yourself by issuing the Xindice command-line 
invocation:</p>
  +<source>xindice rd -c /db/system/SysSymbols -n 
[your_collection_path]</source>
               </section>
               <section>
                   <title>4.2. The Compressed DOM</title>
  +                <p>Now that we understand symbol tables, we can take a look 
at the way in
  +                   which Xindice generated a byte string from any given XML 
document.</p>
  +                <p>The trick is to understand that Xindice simply runs 
through the XML document
  +                   recursively, building a byte sequence for a particular 
node in the tree
  +                   representation of the XML. This will contain the byte 
data for the children
  +                   of the node, and these sub-sequences contain the data for 
their children etc...</p>
  +                <p>Xindice thus starts by generating the byte sequence for 
the document node, which
  +                   will set off generation for the whole XML document.</p>
  +                <section>
  +                    <title>4.2.1. Element nodes</title>
  +                    <p>An element node is encoded as shown in the diagram 
below:</p>
  +                    <figure src="images/element.png" alt="Element compressed 
DOM format"/>
  +                </section>
  +
  +
               </section>
           </section>
           <section>
  
  
  
  1.1                  
xml-xindice/src/documentation/resources/images/element.png
  
        <<Binary file>>
  
  
  1.1                  
xml-xindice/src/documentation/resources/images/element.xcf
  
        <<Binary file>>
  
  

Reply via email to