On 19 Jan 2004, at 17:00, Michael Oliver wrote:
On Mon, 2004-01-19 at 08:31, Stefano Mazzocchi wrote:
On 19 Jan 2004, at 15:12, Michael Oliver wrote:
On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote:
I personally wouldn't know how to make use of a query against full text *and* properties. This is because such a query looks weird to me: full-text is the least structure possible (get me everything but I don't know where) while properties tent to be very much structured (last modified time, author, and so on).
There is a decades long discussion on what is data and what is
metadata
and I don't want to touch that with a stick, but I think that if you
need to do full-text search on your metadata there is something wrong.
Stefano with all due respect, there is nothing wrong with a full-text
search on metadata because metadata in this case can be any properties
of any of the resources in the repository and that meta data can be
free
form text.
Well, this is because I try to avoid having metadata that can be free form text, but as I said, this is my way and I don't want to impose it on others.
Well as long as we CAN have properties that are free form text, we can't
avoid them.
Very true.
consider a search query like
doctype="memo" and description contains "Fire Stefano" and contents contains "January"
I would think that this schema is not appropriate. a description is part of content, not metadata. But it's like arguing about whether something should be an element or an attribute... sometimes it's just subjective.
No, I don't think so. Metadata IS data about data, eh?
Right, and for this very reason metadata is data.... this "about" is the key: it's semantic meaning is relative, not absolute.
And a
"description" can't be anything else, you certainly don't think a binary
file stored in Slide (content) includes the "description" of the
content, which is text, is part of the content?
eheh, we can agree to disgree then: my content could be something like
<image> <bits> ... base64-encoded bitstream of the image ... </bits> <description> this is an image about a horse </description> <image>
or could be GIF image (which *does* have optional text-based descriptions at the end of it).
In JCR we have nodes and properties but we don't specify that nodes are data and properties are metadata. There are some properties that are read-only and auto-generated by the containers (for example things last-modified-time, or creation-time) but they could be things like width/height for images, autogenerated by the notion that the nodetype is an image and for this reason the container reacts on this and extracts the information.
Anyway, this is an accademic point as I do agree that if people store strings in properties, we need a way to search them using full-text.
Slide/WebDAV properties
that can be created by and saved by the user is all about categorization
of and description of the content, almost for the express purpose of
being able to find the right content and therefore should be very much
part of the search mechanism.
Very much agreed, yes.
doctype and description are properties with string values that would be
indexed and matched with the same index as the contents.
So, are you suggesting that we index everything? [not critical, just curious]
Absolutely, if somone wants to save some piece of information they will want to retrieve it and search for it.
Ok, good, it seems we have a direction now.
-- Stefano.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
