[ 
https://issues.apache.org/jira/browse/OPENNLP-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987614#action_12987614
 ] 

Steven Bethard commented on OPENNLP-99:
---------------------------------------

> We still need to take the exception handling into account

Ok, rewritten with exception handling:

// using ObjectStream 
try {
  Event event = stream.read(); 
  while (event != null) { 
    ... 
    event = stream.read(); 
  }
} catch (IOException e) {
  ...
} 

// using Iterator 
Event event; 
try {
  while (iterator.hasNext()) { 
    event = iterator.next(); 
    ... 
  } 
} catch (OpenNLPIOException e) {
  ...
} 

Looks about the same to me. But maybe you had something else in mind? If you 
could write the code that you're concerned about, that would help me understand 
better.

> our users are usually implementing the ObjectStream,
> but not using it to actually read the data them self

Perhaps it would help to see the use case where I wanted this? Here's the code:

val testStream = new RealValueFileEventStream(testPath)
val events = // wrapper for testStream that makes it into a real Iterator<Event>
for (event <- events) {
  val prediction = model.getBestOutcome(model.eval(event.getContext))
  ...
}

Basically, I take each event from a test file, and ask the model to predict an 
outcome. Is this an atypical use case?

> I still believe that it is easier to implement these streams
> when the underlying data source is an InputStream or Reader.

Yes, I agreed to this earlier. But as a user, I care less about how the streams 
are implemented, and more about the API I have to use when calling them.

Or are you suggesting that most users are going to be implementing custom 
ObjectStreams? If that's the case, you could easily define ObjectStream as an 
abstract class that implements Iterator<T> e.g.:

public abstract class ObjectStream<T> implements Iterator<T>  {
  public abstract T read();
  ...
  private T next = this.read();
  public boolean hasNext() {
    return this.next != null
  }
  public T next() {
    T result = this.next;
    this.next = this.read();
    return result;
  }
}

That way, if you think it's easier to implement the ObjectStream API, you just 
subclass from ObjectStream and you get an Iterator for free.

> And ObjectStream has also a reset method ...
> we would need this method on an Iterator like interface too

Well, the standard Java solution to this would be to declare it as an 
Iterable<T> instead. Then code like:

Event event = stream.read(); 
while (event != null) { 
  ... 
  event = stream.read(); 
} 
stream.reset()
while (event != null) { 
  ... 
  event = stream.read(); 
} 

would instead be written as:

for (Event event : iterable) {
  ...
}
for (Event event: iterable) {
  ...
}

Basically, anything that can be reset is an Iterable, and any single pass 
through it is an Iterator.

I also agree that remove() isn't necessary but the Iterable interface declares 
it as optional, so I don't think that's a problem. Just throw an 
UnsupportedOperationException like it says to in the Iterator javadocs.  (Also, 
note that your AbstractEventStream already has a remove() method that does 
exactly this.)

> EventStream should extend Iterator<Event>
> -----------------------------------------
>
>                 Key: OPENNLP-99
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-99
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>            Reporter: Steven Bethard
>
> [As requested, brought over from Sourceforge.]
> Conceptually, EventStream is just an Iterator<Event>. You would get better 
> interoperability with other Java libraries if EventStream were declared as 
> such. If you didn't care about backwards compatibility, I'd say just get rid 
> of EventStream entirely and use Iterator<Event> everywhere instead.
> If you care about backwards compatibility, you could at least declare 
> AbstractEventStream as implementing Iterator<Event> - it declares all of 
> hasNext(), next() and remove(). I believe that shouldn't break anything, and 
> should make all the current EventStream implementations into Iterator<Event>s.
> Why do I want this? Because, when using OpenNLP maxent from Scala, if a 
> RealValueFileEventStream were an Iterator<Event>, I could write:
>   for (event <- stream) {
>     ...
>   }
> But since it's not, I instead have to wrap it in an Iterator:
>   val events = new Iterator[Event] {
>     def hasNext = stream.hasNext
>     def next = stream.next
>   }
>   for (event <- events) {
>     ...
>   }
> Or write the while loop version:
>   while (stream.hasNext) {
>     val event = stream.next
>     ...
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to