Edward, Pycas looks like a pretty complete interface to the CAS, very nice. Given the interface with the XMI format, it wouldn't take too much more effort to create a pycas annotator that would be interoperable with the new uima-as framework extension in Java. Uima-as uses the XMI format CAS for the service interface, compliant with the Oasis standards work. Connectivity would be via ActiveMQ's python client, see http://activemq.apache.org/python.html. Among many other things in ActiveMQ, we really liked the extensive language support for clients.
Is having an interoperable pycas service of interest, or do you see the main utility of pycas being offline post processing of CAS files? More information on uima-as is at http://cwiki.apache.org/UIMA/uimaasdoc.html >From your questions in the documentation, going through an implementation of the CAS from scratch must have given you a good perspective for UIMA design issues. I'll start the discussion going with some of them below. * Does the term 'feature structure' apply to all instances of TOP and its subclasses? Or just to instances that are neither primitives or arrays. If the latter, then what term (if any) is used to describe instances of TOP & its subclasses? I think should TOP should be analogus to Object in Java, which would make TOP a feature structure. However, primitive attributes like Integer and Float also derive from TOP. This seems like something that could be cleaned up in the UIMA type system. * Exactly which types should be considered inheritance final and feature final? * From the Java code, it looks like uima.cas.String is inheritance final; but clearly, it is possible to inherit from it: that's the whole point of the <allowedValues> element in the typeDescription! More of the same? * What should ViewCAS.get_view_name() do for a view with no sofa? Only _InitialView is allowed to have no Sofa. All other views always have Sofas created when the view is created. * Are subclasses of String considered primitive or not? It appears from the Java code like they would return false for type.isPrimitive(). Primitive sounds right to me. Congratulations on getting pycas to this point. Eddie On Feb 18, 2008 5:12 PM, Edward Loper <[EMAIL PROTECTED]> wrote: > I played around with the existing python support for uima, and wasn't > really satisfied with it. It's all done through a swig interface to > c++, and the result isn't exactly easy to use. So I put together a > pure-python package that provides support for reading and writing UIMA > CAS data files. The main motivation behind writing this package was to > allow UIMA data to be read and written by Python programs in a manner > that is natural to the Python language. Here's a very simple example > use case: > > >>> import pycas > >>> # Load a CAS from an XMI or an XCAS file: > >>> cas = pycas.xml.load_cas('myDocument.xml', 'myTypeSystem.xml') > >>> # Look up a type object from myTypeSystem.xml: > >>> Token = cas.type_system['org.mydomain.Token'] > >>> # Iterate over all instances of that type, and perform some work: > >>> for fs in cas.get_annotation_index(Token): > ... token.someProperty = func(token.someOtherProperty) > >>> # Write the modified CAS to an XMI file: > >>> pycas.xml.save_cas(cas, 'myModifiedDocument.xml') > > I put up a temporary webpage for it: > > http://www.cis.upenn.edu/~edloper/pycas/ > > I'd like to release it as an open source project, but wanted to get > feedback from the good uima folks at apache & ibm first. Some > possibilities include: (a) releasing it as a standalone project; (b) > incorporating it into the main UIMA project; and (c) adding it under > the "corpus reader" subpackage of nltk (http://nltk.org). (The name > "pycas" could be changed as well -- I picked it by analogy with jcas.) > > n.b.: pycas does not attempt to provide support for many of the > "framework" features of UIMA, including the ability to combine > processing components together to create applications. It focuses only > on providing access to the data structures that UIMA uses to manage > annotations. If someone else wants to extend what I've done, that's > fine, but all I really wanted was convenient read/write access to UIMA > data files. > > -Edward >
