I haven't experimented with cts:similar-query before and it seems that using 
different numbers in the max-terms option greatly affects the results.
I've not changed any of the options for DB settings, so I'm using the default 
DB settings.
I notice that the default is 16 for max-terms.
I've used the cts:distinctive-terms to try to get a feel for what 
cts:similar-query will use when I change the number of max-terms.
I originally thought that I'd simply take the number of terms (i.e., tokenize 
on space) in the $node, then I thought maybe I should double that to take into 
account the pairs of terms.
Is there any "rule of thumb" here?  (BTW, I'm doing this with 3 different DBs, 
for which the fragment counts are 24M, 131M and 287M, so I have plenty of 
fragments for similar-query to work on...)

A second question is with regards to cts:distinctive-terms output - what does 
an empty cts:term mean?
<cts:term id="4083217226504034818" val="504" score="1032192" 
confidence="0.453548" fitness="0" 
xmlns:cts="http://marklogic.com/cts";></cts:term>

It'd be nice to know what this "term" is since it's the highest scoring term in 
the list...
Thanks,
David
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to