On 19 Jan 2004, at 15:12, Michael Oliver wrote:


On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote:

I personally wouldn't know how to make use of a query against full text
*and* properties. This is because such a query looks weird to me:
full-text is the least structure possible (get me everything but I
don't know where) while properties tent to be very much structured
(last modified time, author, and so on).


There is a decades long discussion on what is data and what is metadata
and I don't want to touch that with a stick, but I think that if you
need to do full-text search on your metadata there is something wrong.

Stefano with all due respect, there is nothing wrong with a full-text
search on metadata because metadata in this case can be any properties
of any of the resources in the repository and that meta data can be free
form text.

Well, this is because I try to avoid having metadata that can be free form text, but as I said, this is my way and I don't want to impose it on others.


consider a search query like

doctype="memo" and description contains "Fire Stefano" and contents
contains "January"

I would think that this schema is not appropriate. a description is part of content, not metadata. But it's like arguing about whether something should be an element or an attribute... sometimes it's just subjective.


doctype and description are properties with string values that would be
indexed and matched with the same index as the contents.

So, are you suggesting that we index everything? [not critical, just curious]


Everybody doesn't use the Database Stores, some actually preter the XML
Stores so an index of the XML should be full text, yes?

This is actually a good question and I don't have a definitive answer. Indexing all text() nodes might be good, but what about namespaces? what about attributes? should we care?


A while ago, thinking about this, I proposed the addition of a numerical namespace to the lucene mailing list but the suggesting didn't catch up [I also have the impression they didn't get my point, but was low priority so I dropped the subject]

I think that indexing an XHTML file is relatively easy. Indexing an FO file with inlined SVG images might not be so straightforward, or lead so the same quality of results without a specific indexer... but there might be a general way to index XML content, but it's not so easy as it seems and lucene also isn't designed for multi-dimensional content, but mono-dimensional one.

But I'm wide open to ideas in this area.

--
Stefano.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to