On Saturday, November 29, 2003, at 11:47 AM, Stefano Mazzocchi wrote:
I would AND together a PrefixQuery for URI "/files/whatever" (allowing it to search a sub-tree rooted there) with a RangeQuery on field "contentlength" for values 40000 and greater.

Hmmm. This seems to have O(n) complexity on scoping. I was aiming to obtain a O(1) complexity on scoping and O(n) on the rest (WHERE and ORDER). O(n) on scoping is not going to be performant enough on very large collections of documents.

I'm not sure what you mean by scoping here (the URI path?), but Lucene can very quickly narrow a range of considered documents down based on the lexicographic range of the terms being queried. For example, if contentlength is a term, only the documents that have a value greater than 40000 would be considered, and getting to that list of documents is quite rapid.


If you're doing full-text searching combined with these types of conditions and want the order to be by how well the documents match your query then Lucene will shine.

yes, but in that case, I think lucene should handle its own query language thru a specific DASL implementation.... using a text-oriented search engine for relational stuff is, IMO, a little abusive.

Definitely understood. Keep in mind that Lucene is my hammer at the moment and the world is my nail :) So I'll put for a very Lucene-centric take on things since I'm immersed in it at the moment.


Traditional relational database type of queries with ORDER BY clauses don't map as well. Ordering, though, can be applied after the query results are returned in this case as you will want to collect all documents that match the query anyway. I'd almost be willing to bet that Lucene will beat most, if not all, relational databases here especially in this case where the hierarchy is being recursively traversed.

Not sure about that.


Lucene is not relational, so it will have to scan the entire list of documents if they belong to a particular scope or not.

Again, I'm not sure what you mean by scope, but unless you're doing something like a WildcardQuery or FuzzyQuery, it will not have to scan all documents. Lucene "scans" by term, not by document. It is an inverted index and walking documents is not something done when searching, generally speaking - it walks the terms requested and then gives back the document id's that match a query.


If you index into a Lucene document a field called "path" that looks like filesystem paths: "/files", "/files/whatever", "/files/whatever/..." and then use a PrefixQuery, only the terms that begin with the path specified are enumerated - making it a recursive query essentially, but in a very rapid term range enumeration under the covers.

note that since DASL queries will be more or less the same all the time, it is possible to think at a relational model that will optimize them greatly.

Sure. Again, I agree that Lucene may be overkill. But I enjoy this discourse to see where Lucene may fall short. Thus far, I still think Lucene can do the job although certainly not as straightforwardly as a SQL-like query on a relational model.


Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to