Re: [Fwd: Re: Iterators: problem when using standard methods in combination with moveTo*]

Marshall Schor Thu, 12 Jul 2007 09:04:41 -0700

It seems to me the only object of confusion arises when users use
next() to get an element and move to the next element, and then use
moveToPrevious, which as you say, may or may not work if the iterator
ended up "invalid" because the next() moved it past the last element.


So the only "improvement" I would think is wanted is to make the moveTo
operations work reliably in these kinds of cases.  Is that hard to do?

-Marshall

Thilo Goetz wrote:

What is the expected behavior?  Here's the impl:

  public boolean hasNext() {
    return isValid();
  }

  public Object next() {
    Object result = get();
    moveToNext();
    return result;
  }

Perfectly reasonable, but has some consequences
that may not be obvious.  For example, when you
do

FS fs1 = it.next();
FS fs2 = it.get();

then fs1 != fs2.  Is that intuitive?  I don't know.
Is it fixable?  Not easily, no.

We also have this more subtle behavior:

FS fs1 = it.next();
it.moveToPrevious();
FS fs2 = it.next();

The last line may throw a NoSuchElementException.  Why?
Because the first line may invalidate the iterator, and
then moveToPrevious() will not normally make the iterator
valid again (sometimes it will, depending on the iterator
implementation).

So in terms of what is reasonable, the iterators behave
as expected.  Still, because the interacations are so
subtle, it is not a good idea to mix the paradigms.  I
never do, even though I think I understand what's going
on.

I'm pretty sure I've documented this before.  I don't know
where that text went.  Maybe I dreamed it.

--Thilo

Marshall Schor wrote:

Thilo - is this "fixable" - so it just works as users expect?

-Marshall

-------- Original Message --------
Subject:     Re: Iterators: problem when using standard methods in
combination with moveTo*
Date:     Thu, 12 Jul 2007 13:33:31 +0200
From:     Thilo Goetz <[EMAIL PROTECTED]>
Reply-To:     [EMAIL PROTECTED]
To:     [EMAIL PROTECTED]
References:
<[EMAIL PROTECTED]>

<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>



Hi Julien,

Julien Nioche wrote:

Thilo and Marshall,

Thanks for sharing the tip. Indeed it would be a good idea to add this
little example to the documentation.

A quick comment about the Iterator methods. I had a problem with the
following piece of code:

/while (wordFormIterator.hasNext()){
WordForm wf = (WordForm)wordFormIterator.next();
if (wf.getBegin()==token.getBegin() && wf.getEnd()==token.getEnd()){
liste.add(wf);
}
else {
//  move back
wordFormIterator.moveToPrevious();
 return liste;
 }
}
/
The last element of the iterator was never accessible because
/hasNext()/ returned false despite the fact that there WAS an element
left in there. /moveToPrevious /had been previously called on this
iterator.

Should not /hasNext() /return true even if the cursor has been moved
forward or backward within the iterator? Or is the use of the legacy
methods (hasNext(), next()) incompatible with the /moveTo* /methods?

hm, I thought this was in our documentation, but couldn't find it myself.
You should not mix the use of next()/hasNext() with the methods defined
in the FSIterator interface.  They do not work well together.  If you use
the FSIterator APIs, you should use them exclusively.  Sorry about that.
I'll add a comment to the javadocs.

Thanks

Julien

To be a bit more explicit, here's some code that will determine how
many tokens the longest sentence in the document contains.  It's a
silly example, but it illustrates the concept.  Maybe this should go
in the docs.  Note: I have not actually run this code, it may not
work immediately ;-)

    CAS cas = ...;
    Type sentenceType =
cas.getTypeSystem().getType("yourSentenceTypeName");
    Type tokenType = cas.getTypeSystem().getType("yourTokenTypeName");
    FSIterator sentenceIt =
cas.getAnnotationIndex(sentenceType).iterator();
    AnnotationIndex tokenIndex = cas.getAnnotationIndex(tokenType);
    FSIterator tokenIt;
    int maxLen = 0;
    int currentLen;
    for (sentenceIt.moveToFirst(); sentenceIt.isValid();
sentenceIt.moveToNext()) {
      tokenIt = tokenIndex.subiterator((AnnotationFS) sentenceIt.get());
      currentLen = 0;
      for (tokenIt.moveToFirst(); tokenIt.isValid();
tokenIt.moveToNext()) {
    ++currentLen;
      }
      maxLen = ((maxLen < currentLen) ? currentLen : maxLen);
    }
    System.out.println("Longest sentence contains " + maxLen + "
tokens.");

--Thilo

Marshall Schor wrote:

Did you consider using subIterators?  These are (briefly) described in
section 4.7.4 of the Apache UIMA Reference book, and may include
exactly
what you're trying to get at - an interator over elements that are
"contained" in the span of other elements.

-Marshall

Julien Nioche wrote:

Hi,

Sorry if someone already asked the question.
Is there a direct way to obtain from a Cas all the annotations of a
given type located between two positions in the text? Something like
getContained(String type,int start,int end)?
I am trying to get all the Tokens contained within a specific
Sentence. I have used iterators for doing that and compared the offset
with those of the Sentence but it is a bit tedious. Have I missed
something obvious?

Thanks

Julien

Re: [Fwd: Re: Iterators: problem when using standard methods in combination with moveTo*]

Reply via email to