Thilo Goetz wrote: > Marshall Schor wrote: > >> Thilo Goetz wrote: >> >>> Marshall Schor wrote: >>> >>> >>>> Thilo Goetz wrote: >>>> >>>> >>>>> See the Jira issue for the cause of the problem. More >>>>> comments below. >>>>> >>>>> Marshall Schor wrote: >>>>> >>>>> >>>>> >>>>>> So, there may be 2 things to look at here - the actual error, described >>>>>> above, and the more philosophical question on the behavior of moveTo - >>>>>> this seems to require a sorting order if the item "moved to" is not >>>>>> present in the index. Perhaps this needs to be documented better. And >>>>>> >>>>>> >>>>>> >>>>> I'm not sure I understand your point about moveTo(). It requires the >>>>> index to be sorted to make any sense (and the BagIndex moveTo() is broken, >>>>> but that's a different issue >>>>> >>>>> >>>> Will you be fixing this too? >>>> >>>> >>> We enter the realm of philosophy again. What's the right >>> behavior for moveTo() when the underlying index isn't sorted? >>> In particular, what should happen when no proper element >>> is found? The javadocs say: >>> >>> Note that any operation like find() or FSIterator.moveTo() will not produce >>> useful results on bag indexes, since bag indexes do not honor comparators. >>> Only >>> use a bag index if you want very fast adding and will have to iterate over >>> the >>> whole index anyway. >>> >>> >> I like systems where user errors are reported :-). If find() and >> moveTo() don't work on bag indexes, I would prefer they throw an >> exception, perhaps like UnsupportedOperationException or our equivalent >> in UIMA. >> > > Fine with me. > > >>> >>> >>>>> ). moveTo(fs) will position the iterator such >>>>> that any element "to the left" is smaller than fs, and all elements at the >>>>> moved-to position and "to the right" of it are greater than or equal to >>>>> fs. It doesn't matter if the item "moved to" is in the index or not. >>>>> Remember that equality here is defined with respect to the sort order of >>>>> the index, it is not feature structure identity. >>>>> >>>>> >>>> Yes, this is something that is unexpected (to me), and I did forget this. >>>> >>>> >>>>> All this is documented, >>>>> but maybe not as clearly as it could be. >>>>> >>>>> >>>>> >>>>> >>>>>> what if no sorting order was defined for the set index? >>>>>> >>>>>> >>>>>> >>>>> Every set index has a sort order. >>>>> >>>>> >>>> This is the part that seems confusing, because our docs say that set >>>> indexes do not enforce ordering, and the common definition for Sets does >>>> >>>> >>> Where did you find that? The javadocs say that set indexes are >>> not guaranteed to be sorted. That's different from saying there's >>> no ordering relation on the members. How else would we determine >>> equality? >>> >>> >> Just by testing the key values for equality, not for order. >> > > Equality here is a notion derived from the partial order > defined on the index. You could define equality separately, > but that would mean introducing a new notion into the index > definitions. I don't think we want that, or at least I don't. > I agree we don't want to introduce a new notion of equality for index definitions at this point. > >>> Maybe we should remove this text, because at this time, set indexes >>> are sorted, and that's not likely to change (I was thinking of hash >>> based sets when I wrote that; still, you'll need a notion of equality, >>> no matter how you implement your sets, yet they don't need to be >>> sorted). >>> >>> >>> >>>> not have an ordering concept. Yet our docs say that the sort order for >>>> sets is used to determine "equality" among candidates in the set: from >>>> section 2.4.1.7: >>>> >>>> An index may define one or more /keys/. These keys determine the sort >>>> order of the feature structures within a sorted index, and determine >>>> equality for set indexes. >>>> >>>> >>> That is incorrect. It should say "0 or more keys". Though if we should >>> alert users to this fact if even UIMA developers have trouble with this >>> is doubtful. >>> >>> >>> >> I think some of our users could be better at remembering these details >> than I am :-) I think this should be fixed - it's just a typo IMHO. >> >>>> Perhaps this should also say something about the use of the sort order >>>> in "moveTo(fs)" for sets? >>>> >>>> >>> In our current implementation, set indexes are sorted indexes >>> without the duplicates (duplicates with respect to the ordering >>> relation of that index, of course). If we commit to this and >>> stop waffling about how set indexes may not be sorted, then we >>> can just say that sorted and set indexes behave the same way. >>> >>> >> My preference is to keep the original definitions - leaving (perhaps >> unrealistically small) room for alternative implementations in the future. >> > > Sure, but how do you propose we improve the documentation, then? > > I'll take a crack at doing this
-Marshall >>> >>> >>>>> If that sort order is empty, it means >>>>> that all FSs are equal for that index. That in turn means that this >>>>> index will contain at most 1 FS at any time. It also means that moveTo() >>>>> will always position the iterator at that one element, if it exists. >>>>> >>>>> Did that help at all? >>>>> >>>>> >>>>> >>>> Yes, thanks for the clarifications. >>>> >>>> -Marshall >>>> >>>> >>>>> --Thilo >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> > > >
