Hello,

I have some questions / doubts about the use of equality and ordering of 
nodes/properties in the current JSR 170 or 283. IIUC, you can configure that 
you have orderableChildNodes. I suppose this ordering is stored in the db or 
FS, depending on what you are using to persist your data. 

Now, AFAICS, despite the fact that you did not set orderableChildNodes, you can 
still query nodes and order them by some node/property (by the UN_TOKENIZED 
lucene field). IIUC, also equality in an XPATH or SQL query is done by the 
lucene index.

>From JSR-283 4.6.2 I do understand that according the last sentence "Support 
>of equality and order comparison of BINARY values is not required", support 
>for equality and order *is* required for not binary values. The current JR 
>implementation therefore 'indexes' (UN_TOKENIZED) the stringValue of *every* 
>property as one single lucene term in the index (See NodeIndexer 
>addStringValue). But, IMHO, who wants to order on the text body of document, 
>or do an equal with string comparison on the body of a text? Ordering and 
>equality is done on things like author and date, not on some document 
>contents. 

So, IMO, it would be better for the specification to allow for configuration 
that indicates orderable or equality is possible for a property. If this is not 
possible, I think we might need to alter the current jackrabbit implementation 
to enable configuration for properties "how" to implement equality and 
ordering. The reason here for is that if I have representable data, with for 
example about 10 properties per document, of which one is "body" (~10 kb), 1/3 
of the index consists of *never* used UN_TOKENIZED (= lucene single 99.9999% 
sure unique term) *body* property. This really is a waste. If the JSR is 
reluctant regarding configurable equality, we could store for larger values in 
lucene a term that is some checksum(), though, we then have no 100% garantueed 
equality then, which is probably pretty undesirable. 

My preference would be (easy to achieve because I already implemented it 
locally) is to enable equality/ordering set to false in the upcoming 1.4 
IndexingConfiguration [1]. Then, you can just configure the body property for 
example to not be added to the index as UN_TOKENIZED. 

WDOT?

Regards Ard

[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration


-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
[EMAIL PROTECTED] / [EMAIL PROTECTED] / http://www.hippo.nl
-------------------------------------------------------------- 

Reply via email to