Re: Proposal : index store - Lucene

Stefano Mazzocchi Tue, 20 Jan 2004 04:46:33 -0800

On 19 Jan 2004, at 17:00, Michael Oliver wrote:

On Mon, 2004-01-19 at 08:31, Stefano Mazzocchi wrote:
On 19 Jan 2004, at 15:12, Michael Oliver wrote:
On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote:
I personally wouldn't know how to make use of a query against full
text
*and* properties. This is because such a query looks weird to me:
full-text is the least structure possible (get me everything but I
don't know where) while properties tent to be very much structured
(last modified time, author, and so on).
There is a decades long discussion on what is data and what is metadata and I don't want to touch that with a stick, but I think that if you need to do full-text search on your metadata there is something wrong.
Stefano with all due respect, there is nothing wrong with a full-text search on metadata because metadata in this case can be any properties of any of the resources in the repository and that meta data can be free form text.
Well, this is because I try to avoid having metadata that can be free
form text, but as I said, this is my way and I don't want to impose it
on others.
Well as long as we CAN have properties that are free form text, we can't avoid them.

Very true.

consider a search query like

doctype="memo" and description contains "Fire Stefano" and contents
contains "January"


I would think that this schema is not appropriate. a description is
part of content, not metadata. But it's like arguing about whether
something should be an element or an attribute... sometimes it's just
subjective.

No, I don't think so. Metadata IS data about data, eh?

Right, and for this very reason metadata is data.... this "about" is the key: it's semantic meaning is relative, not absolute.

And a "description" can't be anything else, you certainly don't think a binary file stored in Slide (content) includes the "description" of the content, which is text, is part of the content?

eheh, we can agree to disgree then: my content could be something like

 <image>
  <bits>
   ... base64-encoded bitstream of the image ...
  </bits>
  <description>
   this is an image about a horse
  </description>
 <image>

or could be GIF image (which *does* have optional text-based descriptions at the end of it).

In JCR we have nodes and properties but we don't specify that nodes are data and properties are metadata. There are some properties that are read-only and auto-generated by the containers (for example things last-modified-time, or creation-time) but they could be things like width/height for images, autogenerated by the notion that the nodetype is an image and for this reason the container reacts on this and extracts the information.

Anyway, this is an accademic point as I do agree that if people store strings in properties, we need a way to search them using full-text.

Slide/WebDAV properties that can be created by and saved by the user is all about categorization of and description of the content, almost for the express purpose of being able to find the right content and therefore should be very much part of the search mechanism.

Very much agreed, yes.

doctype and description are properties with string values that would be indexed and matched with the same index as the contents.
So, are you suggesting that we index everything? [not critical, just
curious]
Absolutely, if somone wants to save some piece of information they will
want to retrieve it and search for it.

Ok, good, it seems we have a direction now.

--
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proposal : index store - Lucene

Reply via email to