Re: Indexing store

Stefano Mazzocchi Sun, 18 Jan 2004 14:30:57 -0800

On 18 Jan 2004, at 12:34, Nick Reddel wrote:

Hi everyone

I have developed an indexing store (subclassing j2ee/rdbms
adapter)(which indexes properties, NOT content), at the moment just for
MySQL 4.1+ but presumably easily transferable to any othe RDMS that
supports subselects.

Way cool.

But...the code is horrible and undocumented,

eheh, what code isn't ;-)

because as soon as I got it working I then moved on to a "virtual" store (i.e. a relational filesystem, which will be the sweetest thing as soon as I've sorted out some sort of leak which means the connections don't go back to the pool).

The reason I did some of the strange things I did, which I've marked below with (!), is that I've made a refreshing Netbeans webdav client, which for the sake of completeness needs to know if absolutely any modifications have happened to a given uri/object/file, particularly lock changes. My idea was to have a group of authors working on the same set of files, only allowing offline content changes if the file's checked out, rather than the local/remote model that's standard. I'll explain why if anyone's interested, but in another mail.

This seems like a job for a DeltaV-aware client.

So> a few notes/queries on the implementation. I can upload my sources, but I'd need a little time to make it slightly less embarassing, and any input/critiques I can get on the solution below

ok

(architecturally I find
Slide a completely impossible act to follow. Which makes me a little
shy).

Slide is a complex beast, probably too complex in some areas I agree.

My last update from CVS was early December, and I modified some
org.apache.slide files as needed - I'd imagine similar modifications
would be needed for any indexing implementation:
Search languages:
-----------------
There is API support for different search languages, but no actual
implementation.
In Domain.xml, I added a new parameter for search languages and patched
SearchImpl.java as per attachment
<parameter name="search_language_classes">org.apache.slide.search.basic.BasicSearc h Language,

com.ella.alexis.chakriya.slide.indexing.MultipleSearchLanguage</ paramete r>

</configuration>
Property indexing
-----------------
Two separate issues - creation of the indexing table (s), and index
updating
Index table creation/rebuilding.

- the better way to do this would be through Slide Admin. But

1. is slide admin maintained? (no). do you want someone to maintain it?

In my experience a command line tool is much more useful than a web-based tool because sysadmin are much more worried about web interfaces thru SSL than with command-line based ones over SSH.

2. is there any way in Tomcat/servlet containers to define webapp dependencies, and hence classloader access across webapps, rather than "copy it all into common", which sort of defeats the purpose of webapps?

no. servlet contexts are meant to be completely isolated (which is the reason why webapp composability cannot be achieved [and the reason why i designed cocoon blocks, but this is another story])

My actual implementation uses a parameter in the nodestore element which tells the adapter whether or not to rebuild the index table on adapter initialisation.

Why would you want to do that? [curious]

The ideal way to support index building would, I think, be to define a
"resource" path in the filesystem itself, which the indexing services
use to determine the data types and tables for the various properties,
i.e. a folder

/files/_resources

containing

dav.xsd
myXMLNS1.xsd
...

which maps from xml data types to the various db types....and, of
course, allows validation checks if desired. (jakarta-db-ojb??)

hmmm, not sure about this... sounds a little hacky to me.

I haven't done it this way, although on my client I already do, so I
will definitely move it across. Instead, I extended the old QName table
with [table_name] and [data_type] columns. Table (re)building goes:
1. update qname so it contains all defined properties
2. add any columns to the table(s) for which a qname record has a
defined data_type value.
3. index all files.
(!) Rather than use interceptors, every method in the adapter that modifies the underlying RDBMS also reindexes the file. Principally because my client is mostly interested in the "meta-meta" property of whether anything (acl, lock, properties) have changed, so as to cache successfully. Change propogation to children I simply ignored (because for a relational system, which is what I was after, there's no such thing as a definitive parent/child relationship).

I'm not sure I understand what you mean here, can you elaborate more?

SEARCH method
-------------

? How does the BasicQuery implementation access the datasouce?  I re
protected-ed the "getConnection" method in a subclass of j2eeStore, in
the query implementation's package. Seems reasonable to me.

sounds reasonable to me too.

? How to search for meta-meta properties (i.e. change/delete
notification).
        I defined some new properties which are only evaluated for
infinite depth searches on the filesystem root ("/")
        It's wrong, and inelegant, among other things it means SEARCH
gives you info which PROPFIND doesn't, because in Slide the store has
much greater control over SEARCH than PROPFIND.

Ideally, I think, the real solution would be a PropertyProvider that
plugs into the store directly. But whole slabs of
src/webdav/server/org/apache/slide/webdav/util

use statements like the following:

AbstractResourceKind.isComputedProperty(propName)

which means any subclassing of the property provision section of slide
would mandate  refactoring of these classes.

Not sure I follow here either, can you elaborate more?

Performance
-----------
On my machine (a battered laptop with specs blah), the following on a
folder containing 500 files
getting the following, PROPFINDs equivalent:

<d:searchrequest xmlns:d="DAV:"
xmlns:alx="http://www.ella-associates.org/alexis/";>
<alx:multiplesearch xmlns:alx="http://www.ella-associates.org/alexis/";>
    <d:basicsearch xmlns:d="DAV:">
        <d:select>
            <d:prop>
                <d:displayname />
                <d:getcontentlength />
                <d:getlastmodified />
            </d:prop>
        </d:select>
        <d:from>
            <d:scope>
                <d:href>/slide/files/documents/bigf</d:href>
                <d:depth>1</d:depth>
            </d:scope>
        </d:from>
        <d:where>
            <d:gt>
                <d:prop>
                    <d:resource-id />
                </d:prop>
                <d:literal>0</d:literal>
            </d:gt>
        </d:where>
    </d:basicsearch>
</alx:multiplesearch></d:searchrequest>


***alx:multiplesearch bundles a lot of search requests, which means a
client wanting change info on a bunch of folders only has to send one
SEARCH

Way cool the multiple search!

PROPFIND (1st time, revisiondescriptor cache building)
16294 ms
PROPFIND (2nd time, from cache)
9003 ms
SEARCH
772 ms
(and some of that's just the pipes waiting on each other)

wow, impressive performance improvement, but doesn't surprise me since the current search algorithm is linear and cannot scale.

Which sort of begs the question, why not allow PROPFIND to just call a
SEARCH, if appropriate? (And yes, I have got classes to generate a
SEARCH body).

that was I was planning to do as well, yes.

"KNOWN ISSUES" these aren't truly critical, because anything this search implementation can't handle it just throws back to the default implementation - doesn't allow ACL/Lockdiscovery in the select - only supports depth 1 (or infinity for meta-meta)

Can depth n be implemented with n queries on depth 1? what about joins on parent/child relationships and let the DB do its query optimization job?

- (Critical) doesn't resolve links/binding (but doable completely in
SQL, I just need to check out BindingStore)

this would be a must to have if we start using binding seriously.

///////////////////////

Any feedback on the above would be truly appreciated. Particularly from
Stefano Mazzocchi.

Hope this helps.

Anyway, thanks much for your contribution it is greatly appreciated!

--
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing store

Reply via email to