Re: Python support for reading & writing UIMA data files

Edward Loper Mon, 18 Feb 2008 19:52:14 -0800

On Feb 18, 2008 7:53 PM, Eddie Epstein <[EMAIL PROTECTED]> wrote:
> Pycas looks like a pretty complete interface to the CAS, very nice.


Thanks.

> Given the interface with the XMI format, it wouldn't take too much
> more effort to create a pycas annotator that would be interoperable
> with the new uima-as framework extension in Java. [...]
> Is having an interoperable pycas service of interest, or do you see
> the main utility of pycas being offline post processing of CAS files?

I would be delighted to see pycas extended to become an interoperable
service; but I probably won't have any time to make it happen myself.

> *   Does the term 'feature structure' apply to all instances of
> TOP and its subclasses? [...]
>
> I think should TOP should be analogus to Object in Java,  [...]
> However, primitive attributes like
> Integer and Float also derive from TOP.

Java differs from Python here -- in Python, *all* types (including
primitives) are essentially considered subclasses of the topmost type
(object).  You can even have bound methods of primitives -- eg
"inc=(1).__add__".  So the notion that Integer and Float derive from
TOP makes perfect sense in the Python world.  Nevertheless, uima seems
to make a pretty strong distinction between primitive values and
non-primitive values.  And as far as I can tell from usage in the java
code/docs, the term "feature structure" is basically used to mean
non-primitive values.

>     * Are subclasses of String considered primitive or not? It appears
> from the Java code like they would return false for type.isPrimitive().
>
> Primitive sounds right to me.

Ok, so this may be a bug in the java implementation then?

While on the topic of design issues, another thing that I've been
pondering: I don't see any principled way to make the decision of
whether a given sequence-typed feature should be defined as an array
vs a linked list.  E.g., should a coreference chain be a linked list
or an array?  It really depends on how the data will be used -- but
that same data may get used in different ways by different people.
(Of course, there are a few semantic differences between arrays &
linked lists -- e.g., linked lists allow for shared tails -- but I
don't think there ones that people should be relying on!)  In an
abstract sense, it would be nice if the data (and type) model could
abstract away and call them both sequences; and individual programs
could decide whether specific features would be treated as lists or
arrays.  But of course that's more practical if the framework
components are communicating via a serialization channel (eg xml) than
if they're using shared memory.

-Edward

Re: Python support for reading & writing UIMA data files

Reply via email to