> - metadata: we need a neutral way to query metadata for > collections and > resources. I like David's solution of having a MetaData object with a
I also hope we can have metadata at the database level. http://marc.theaimsgroup.com/?l=xindice-dev&m=103790372009713&w=2 > > 2. PERFORMANCE > Face it: we are slow. We are fair enough for small jobs but we cannot > stand high loads or huge documents, no matter how accurate > your indexes Xindice has its own B-Tree files for data storage and search. Could we consider leveraging existing RDBM systems? RDBM has been developed and fine tuned for so many years, and they have solved many issues that we are going to tackle (performance, transaction, and security). What need to be done is to define an efficient data schema and provide query language translation (XPath to SQL). A very preliminary thought is attached at the end of this mail. n. Cross-collection search. n+1. Vocabulary mapping The '/A/B' in name space one might be equal to '/X/Y' in name space two. If we allow user to set such rules, the system may return both when searching for either '/A/B' or '/X/Y'. Regards, Lixin ------------- Store and search an arbitrary XML files with out-of-shelf relational database. For XML file: <A a='some attribute value'> <B> <C c='attribute for c'>Something here</C> <D>first D</D> <D>second D</D> <D d='third'>third D</D> </B> </A> Break it down to a list of name-value pairs: "/A/@a" "some attribute value" "/A/B/C/@c" "attribute for c" "/A/B/C" "Something here" "/A/B/D[1]" "first D" "/A/B/D[2]" "second D" "/A/B/D[3]" "third D" "/A/B/D/@d" "third" Save them into tables (there should be more tables, such as one hold the original file so we won't have to reconstruct it. Haven't thought about how to return a sub-tree): Value table ID Value id001 Some attribute value id002 attribute for c id003 Something here id004 first D id005 second D ... ... Meaning/Meta table IDRef DocID path Index id001 doc1 /A/@a id002 doc1 /A/B/C/@c id003 doc1 /A/B/C id004 doc1 /A/B/D 1 id005 doc1 /A/B/D 2 ... ... ... For query: /A/B/[EMAIL PROTECTED]'Something here'], Convert it into SQL: select ... from ... where ... "/A/B/C/@c" and ... "attribute for c" The SQL will return a set of 'DocID', for example, for matching documents. The SQL might be complex, but RDBMS are proven for handling large amount of data.