On Feb 18, 2008 7:53 PM, Eddie Epstein <[EMAIL PROTECTED]> wrote: > Pycas looks like a pretty complete interface to the CAS, very nice.
Thanks. > Given the interface with the XMI format, it wouldn't take too much > more effort to create a pycas annotator that would be interoperable > with the new uima-as framework extension in Java. [...] > Is having an interoperable pycas service of interest, or do you see > the main utility of pycas being offline post processing of CAS files? I would be delighted to see pycas extended to become an interoperable service; but I probably won't have any time to make it happen myself. > * Does the term 'feature structure' apply to all instances of > TOP and its subclasses? [...] > > I think should TOP should be analogus to Object in Java, [...] > However, primitive attributes like > Integer and Float also derive from TOP. Java differs from Python here -- in Python, *all* types (including primitives) are essentially considered subclasses of the topmost type (object). You can even have bound methods of primitives -- eg "inc=(1).__add__". So the notion that Integer and Float derive from TOP makes perfect sense in the Python world. Nevertheless, uima seems to make a pretty strong distinction between primitive values and non-primitive values. And as far as I can tell from usage in the java code/docs, the term "feature structure" is basically used to mean non-primitive values. > * Are subclasses of String considered primitive or not? It appears > from the Java code like they would return false for type.isPrimitive(). > > Primitive sounds right to me. Ok, so this may be a bug in the java implementation then? While on the topic of design issues, another thing that I've been pondering: I don't see any principled way to make the decision of whether a given sequence-typed feature should be defined as an array vs a linked list. E.g., should a coreference chain be a linked list or an array? It really depends on how the data will be used -- but that same data may get used in different ways by different people. (Of course, there are a few semantic differences between arrays & linked lists -- e.g., linked lists allow for shared tails -- but I don't think there ones that people should be relying on!) In an abstract sense, it would be nice if the data (and type) model could abstract away and call them both sequences; and individual programs could decide whether specific features would be treated as lists or arrays. But of course that's more practical if the framework components are communicating via a serialization channel (eg xml) than if they're using shared memory. -Edward
