RE: [RT] Xindice 2.0

Lixin Meng 27 Nov 2002 17:10:05 -0000

> - metadata: we need a neutral way to query metadata for
> collections and
> resources. I like David's solution of having a MetaData object with a


I also hope we can have metadata at the database level.
http://marc.theaimsgroup.com/?l=xindice-dev&m=103790372009713&w=2

>
> 2. PERFORMANCE
> Face it: we are slow. We are fair enough for small jobs but we cannot
> stand high loads or huge documents, no matter how accurate
> your indexes

Xindice has its own B-Tree files for data storage and search. Could we
consider leveraging existing RDBM systems? RDBM has been developed and fine
tuned for so many years, and they have solved many issues that we are going
to tackle (performance, transaction, and security). What need to be done is
to define an efficient data schema and provide query language translation
(XPath to SQL). A very preliminary thought is attached at the end of this
mail.

n. Cross-collection search.

n+1. Vocabulary mapping

The '/A/B' in name space one might be equal to '/X/Y' in name space two. If
we allow user to set such rules, the system may return both when searching
for either '/A/B' or '/X/Y'.

Regards,
Lixin

-------------
Store and search an arbitrary XML files with out-of-shelf relational
database.

For XML file:
         <A a='some attribute value'>
            <B>
                <C c='attribute for c'>Something here</C>
                <D>first D</D>
                <D>second D</D>
                <D d='third'>third D</D>
             </B>
         </A>

        Break it down to a list of name-value pairs:

        "/A/@a"         "some attribute value"
        "/A/B/C/@c"             "attribute for c"
        "/A/B/C"                "Something here"
        "/A/B/D[1]"             "first D"
        "/A/B/D[2]"             "second D"
        "/A/B/D[3]"             "third D"
        "/A/B/D/@d"             "third"

Save them into tables (there should be more tables, such as one hold the
original file so we won't have to reconstruct it. Haven't thought about how
to return a sub-tree):

        Value table
        ID              Value
        id001           Some attribute value
        id002           attribute for c
        id003           Something here
        id004           first D
        id005           second D
        ...     ...

        Meaning/Meta table
        IDRef           DocID           path            Index
        id001           doc1            /A/@a
        id002           doc1            /A/B/C/@c
        id003           doc1            /A/B/C
        id004           doc1            /A/B/D  1
        id005           doc1            /A/B/D  2
        ...     ...     ...

For query: /A/B/[EMAIL PROTECTED]'Something here'], Convert it into SQL:
        select ...
        from   ...
        where ... "/A/B/C/@c" and ... "attribute for c"

The SQL will return a set of 'DocID', for example, for matching documents.
The SQL might be complex, but RDBMS are proven for handling large amount of
data.

RE: [RT] Xindice 2.0

Reply via email to