Re: Adjusting the behavior of SelectFS limit and shifted operations

Richard Eckart de Castilho Fri, 20 Nov 2020 13:05:13 -0800

Hi Mario,

if I understand you correctly, you imagine select() to be a streaming 
operation. Actually, it is not - at least not immediately.


When select() is invoked, it creates an object that is a hybrid between a 
builder, an Iterable and a Stream. If at any point you invoke an Iterable or 
Stream method on it, it loses the other personalities.

The methods such as the following are part of the "builder" personality:

- following(x)
- coveredBy(x)
- covering(x)
- ...

- shifted(y)
- backwards()
- noneOverlapping()
- typePriorities()
- ...

While operating on the "builder" personality, the order of methods has no 
effect. E.g. the following calls are all equivalent:

cas.select(Token.class).shifted(-1).following(t3).backwards()
cas.select(Token.class).following(t3).backwards().shifted(-1)
cas.select(Token.class).backwards().shifted(-1).following(t3)

If you try to give conflicting instructions to the builder personality, the 
last instruction should be used, e.g.

cas.select(Token.class).following(t3).shifted(-1).preceding(t4)

should be equivalent to

cas.select(Token.class).preceding(t4).shifted(-1)

(... or if there are bugs it might do something unexpected ...)

Methods like coveredBy(x) or covering(x) set up bounds for the iterator 
internally created by SelectFS.
I think the initial idea for following(x)/preceding(x) was that they would not 
define bounds - but IMHO that doesn't make too much sense. From my perspective 
they also define bounds either from the beginning of the document to x 
(preceding) or from x to the end of the document (following). There is also the 
startAt(x) method - this does not define a boundary - it just moves the 
iterator to a given start position. 

So while the following operations are bounded:

cas.select(Token.class).following(x).asList()
cas.select(Token.class).preceding(x).asList()

these operations are their respective not-bounded versions

cas.select(Token.class).startAt(x).asList()
cas.select(Token.class).startAt(x).backwards().asList()

The not-bounded versions behave a bit differently from the bounded ones. E.g. 
preceding(x) returns annotations in document order while startAt(x).backwards() 
returns them in iteration order. Also,
following(x) and preceding(x) would never include x in their results, while 
startAt(x) should return
x as the first entry in the result list. I do hope that I explained this 
correctly and that it makes sense and that it mostly matches the 
implementation. I am still working on setting up a tighter test suite to ensure 
it does ;)

select() only really becomes a stream if you invoke stream() or a method from 
the Stream interface (e.g. filter() or map()). It can also become a list, an 
array, or an iterator. So the following is actually *not* possible:

select(Token.class).filter(t -> t.getCoveredText().equals("blah")).shifted(1)

because "shifted()" is a method from the builder personality of SelectFS while 
"filter()" is a method of the Stream personality. However, this would work:

select(Token.class).filter(t -> t.getCoveredText().equals("blah")).skip(1)

because "skip()" is a method on Stream.

Ok, but independent of the different personalities of select(), I understand 
that you'd find it not logical or intuitive that limit and shifted interact 
with each other. But you do support the idea of
capping shift at 0 and simply ignoring any smaller values for bounded 
selections.

Cheers,

-- Richard

Re: Adjusting the behavior of SelectFS limit and shifted operations

Reply via email to