Re: CAS and CasView redesign - taking a step back
Adam Lally wrote: This chain of emails has a lot of tangled threads and this is an attempt to disentangle them and reevaluate what it we're trying to acheive. I think there were basically two motivations for the whole ball of yarn: (1) We're unhappy with some of our APIs, in particular (a) the interface "CAS" can be an interface to either the whole CAS or to a view, and (b) the logic determining which "CAS" gets passed to an annotator's process method is confusing. We have a rare opportunity to clean some of this up in v2.1. Part of the unhappiness for me is that what we've done is complex, hard to describe and document. +1 to the "rare opportunity" to clean some of this up. (2) We want to come up with a clean, consistent design for the CAS that is consistent with the current OASIS UIMA Specification proposal. Primarily I'm the one who has been pushing (2), but I think I've been getting a little ahead of myself on that front. I'm getting myself into the position of trying to be the advocate of what's in the UIMA spec and convincing the other committers (well mostly Thilo, as he is the one giving this the most attention) that they make sense. But that does not seem like the right way to go about this. Taking a step back, the way this process ought to work is that the interested parties should work with the OASIS UIMA TC to produce an architecture spec that's agreeable, and then we'll figure out how to implement it. We can have discussions amongst the implementers here, but in the end we can't really decide any architecture issues on our own. It would not make sense for us to implement any major new designs unless we're sure they're going to be consistent with the forthcoming UIMA specification. My worry with this line of thinking is that it may take a very long time for the OASIS process to converge. But I would like what we come up with to be in line with what we think the TC would eventually get to. So, in the coming months we need to make a significant effort to work with the OASIS UIMA TC on devising a CAS model that we can agree to. In the mean time, I think we should take off the table for 2.1 any significant realignment of the implementation with the spec. I think I agree here - the focus should be on making the APIs sensible, easily understandable, and well architected (whatever that means :-) ) That being said, we still have this opportunty to do address some of our API issues (1) in v2.1, and I don't want to waste it. I think we should look at our most serious API issues and see if anything can be done about them. If there are things that we would otherwise want to do, and those things happen to be consistent with the current spec proposal, then great, we can do them now. But unless both of those conditions are met I think we may want to sit on our hands for now. Does that make sense? Yes. Next step is to actually take a crack at stating the "most serious API issues". -Marshall
Re: [jira] Commented: (UIMA-10) Split JCas into interface and implementation
Adam Lally (JIRA) wrote: [ http://issues.apache.org/jira/browse/UIMA-10?page=comments#action_12461223 ] Adam Lally commented on UIMA-10: I fixed a few more that you missed -- the example SimpleTextMerger and SimpleTextSegmenter, and the OpenNLP wrapper examples. The OpenNLP stuff is tricky because it's not in a source directory due to having a dependency on the OpenNLP jars which we don't ship. The only remaining occurrences of JCasImpl seem to be in JCas-generated classes and from other implementation classes. I still think that moving the static utility methods from JCasImpl to a new JCasUtil class might be would be worthwhile. Also there's the trick that EMF uses: you can't call a method JCas.method() where JCas is an interface, but you can do JCas.UTIL.method(); where UTIL is a static field on the JCas interface. Is the filler of JCas.UTIL a reference to the JCasUtil "class"? e.g. static final class UTIL = JCasUtil.class; ? And is the idea to replace references like: JCasUtil.(...) with JCas.UTIL.(...) ? I guess I'm unclear about what goal is being served by doing that? -Marshall
[jira] Created: (UIMA-146) UimacppAnalysisComponent doesn't support ResultSpecification
UimacppAnalysisComponent doesn't support ResultSpecification Key: UIMA-146 URL: http://issues.apache.org/jira/browse/UIMA-146 Project: UIMA Issue Type: Bug Components: Core Java Framework Reporter: Adam Lally Assigned To: Eddie Epstein It looks like UimacppAnalysisComponent will always pass a null result specification through the JNI. This should be fixed once the C++ enablement layer is ready. Note that the new Java annotator interfaces have a setResultSpecification(ResultSpecification) method, rather than including a ResultSpecification on the process call. We could consider doing that in C++ as well. This has the advantage of notifying the annotator only when the result spec changes, so the annotator doesn't waste time checking the result spec on every process call. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Comment cleanup?
On 12/28/06, Marshall Schor <[EMAIL PROTECTED]> wrote: Adam Lally wrote: > Yes, I'll fix those. The only ones I see are in AnalysisEngine.java. > Are there others I'm missing? When I scan the code for "ResultSpecification)" I get about 40 hits. Some of these are comments like this in test cases. Some are instances that are actually OK. There's a strange one in UimacppAnalysisComponent.java - makes me wonder if we've neglected to fix up the cpp interfaces with the new Result Specification approach. Yes, looks like UimacppAnalysisComponent doesn't support ResultSpecifications at all anymore. We'll need to address that once the C++ enablement layer code is further along. I'll open a JIRA issue so we don't forget. Otherwise, those other references to ResultSpecification all look OK, except for a little bit of deprecated code that is no longer necessary and which I deleted. -Adam
Re: svn commit: r490686 - /incubator/uima/uimaj/trunk/uima-docbooks/build.xml
It's used for example in the Ant view of Eclipse. If you don't specify it, your build file is call "project". No idea what else it is used for. Marshall Schor wrote: Hi Thilo - I could never figure out what the purpose of the name attribute on the tag was. Is this just for some informal user documentation, or is it used somewhere? When I created the docbook project, I started with the Jakarta Velocity docbook package. This property seemed to be unused, and it was "wrong" (it was left over from the Jakarta Velocity project) and so it seemed to be something that took "maintenance" without any benefit... so I had removed it. -Marshall [EMAIL PROTECTED] wrote: Author: twgoetz Date: Thu Dec 28 01:40:45 2006 New Revision: 490686 URL: http://svn.apache.org/viewvc?view=rev&rev=490686 Log: Minor (no JIRA): add name attribute to project declaration in build.xml of uimaj-docbook. Modified: incubator/uima/uimaj/trunk/uima-docbooks/build.xml Modified: incubator/uima/uimaj/trunk/uima-docbooks/build.xml URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/build.xml?view=diff&rev=490686&r1=490685&r2=490686 == --- incubator/uima/uimaj/trunk/uima-docbooks/build.xml (original) +++ incubator/uima/uimaj/trunk/uima-docbooks/build.xml Thu Dec 28 01:40:45 2006 @@ -19,7 +19,7 @@ --> - +
CAS and CasView redesign - taking a step back
This chain of emails has a lot of tangled threads and this is an attempt to disentangle them and reevaluate what it we're trying to acheive. I think there were basically two motivations for the whole ball of yarn: (1) We're unhappy with some of our APIs, in particular (a) the interface "CAS" can be an interface to either the whole CAS or to a view, and (b) the logic determining which "CAS" gets passed to an annotator's process method is confusing. We have a rare opportunity to clean some of this up in v2.1. (2) We want to come up with a clean, consistent design for the CAS that is consistent with the current OASIS UIMA Specification proposal. Primarily I'm the one who has been pushing (2), but I think I've been getting a little ahead of myself on that front. I'm getting myself into the position of trying to be the advocate of what's in the UIMA spec and convincing the other committers (well mostly Thilo, as he is the one giving this the most attention) that they make sense. But that does not seem like the right way to go about this. Taking a step back, the way this process ought to work is that the interested parties should work with the OASIS UIMA TC to produce an architecture spec that's agreeable, and then we'll figure out how to implement it. We can have discussions amongst the implementers here, but in the end we can't really decide any architecture issues on our own. It would not make sense for us to implement any major new designs unless we're sure they're going to be consistent with the forthcoming UIMA specification. So, in the coming months we need to make a significant effort to work with the OASIS UIMA TC on devising a CAS model that we can agree to. In the mean time, I think we should take off the table for 2.1 any significant realignment of the implementation with the spec. That being said, we still have this opportunty to do address some of our API issues (1) in v2.1, and I don't want to waste it. I think we should look at our most serious API issues and see if anything can be done about them. If there are things that we would otherwise want to do, and those things happen to be consistent with the current spec proposal, then great, we can do them now. But unless both of those conditions are met I think we may want to sit on our hands for now. Does that make sense? -Adam
Re: Comment cleanup?
Adam Lally wrote: Yes, I'll fix those. The only ones I see are in AnalysisEngine.java. Are there others I'm missing? When I scan the code for "ResultSpecification)" I get about 40 hits. Some of these are comments like this in test cases. Some are instances that are actually OK. There's a strange one in UimacppAnalysisComponent.java - makes me wonder if we've neglected to fix up the cpp interfaces with the new Result Specification approach. -Marshall
Re: Comment cleanup?
Yes, I'll fix those. The only ones I see are in AnalysisEngine.java. Are there others I'm missing? -Adam On 12/28/06, Marshall Schor <[EMAIL PROTECTED]> wrote: Adam - there are a bunch of comments in the code that look like: @link #process(, ResultSpecification) I think some of these (most?) need to be fixed to correspond to the new Results Specification. Can you take a look? -Marshall
[jira] Commented: (UIMA-10) Split JCas into interface and implementation
[ http://issues.apache.org/jira/browse/UIMA-10?page=comments#action_12461223 ] Adam Lally commented on UIMA-10: I fixed a few more that you missed -- the example SimpleTextMerger and SimpleTextSegmenter, and the OpenNLP wrapper examples. The OpenNLP stuff is tricky because it's not in a source directory due to having a dependency on the OpenNLP jars which we don't ship. The only remaining occurrences of JCasImpl seem to be in JCas-generated classes and from other implementation classes. I still think that moving the static utility methods from JCasImpl to a new JCasUtil class might be would be worthwhile. Also there's the trick that EMF uses: you can't call a method JCas.method() where JCas is an interface, but you can do JCas.UTIL.method(); where UTIL is a static field on the JCas interface. > Split JCas into interface and implementation > > > Key: UIMA-10 > URL: http://issues.apache.org/jira/browse/UIMA-10 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework >Reporter: Adam Lally > Assigned To: Adam Lally > Fix For: 2.1 > > > We should split the existing JCAS class into an interface > org.apache.uima.jcas.JCas and its implementation > org.apache.uima.jcas.impl.JCasImpl. This follows good design practices and > also is consistent with the rest of uimaj-core. It is important to do this > prior to our first release since it will be a non-compatible change for user > code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Use of TCASRuntimeException?
Eddie - there are 40 references to this class - but I'm wondering if these need fixing if we're getting rid of TCAS? -Marshall
Comment cleanup?
Adam - there are a bunch of comments in the code that look like: @link #process(, ResultSpecification) I think some of these (most?) need to be fixed to correspond to the new Results Specification. Can you take a look? -Marshall
[jira] Resolved: (UIMA-10) Split JCas into interface and implementation
[ http://issues.apache.org/jira/browse/UIMA-10?page=all ] Marshall Schor resolved UIMA-10. Resolution: Fixed Assignee: Adam Lally (was: Marshall Schor) Thanks for pointing out the need to switch more things over to using JCas instead of JCasImpl. I did a workspace-wide search for all uses of JCasImpl, and I think I switched all that could be over to use JCas. Assigning to Lally to see if he can find any more things to fix :-) Others can of course contribute too. > Split JCas into interface and implementation > > > Key: UIMA-10 > URL: http://issues.apache.org/jira/browse/UIMA-10 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework >Reporter: Adam Lally > Assigned To: Adam Lally > Fix For: 2.1 > > > We should split the existing JCAS class into an interface > org.apache.uima.jcas.JCas and its implementation > org.apache.uima.jcas.impl.JCasImpl. This follows good design practices and > also is consistent with the rest of uimaj-core. It is important to do this > prior to our first release since it will be a non-compatible change for user > code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: CAS Views and Sofas simplification
More later (time to take stock of where we are again, I think...) but first: On 12/27/06, Thilo Goetz <[EMAIL PROTECTED]> wrote: Adam Lally wrote: > On 12/22/06, Marshall Schor <[EMAIL PROTECTED]> wrote: >> > Also, we have some uses of non-annotation indexes that are segregated >> > by Sofa (say, a Lemma index that's particular to a Sofa, where there's >> > actually no explicit link from the Lemma to the Sofa). A filtering >> > approach wouldn't work there, >> It could be made to work by adding a feature to the Lemma type which was >> a sofa reference. But maybe that's asking too much of the user? > > I'm not sure what is right here... this is a reasonable idea. But I > think in the absence of a clear sense of what is best I lean towards > staying closer to where were currently are, which is to have view > where the user explicitly decides which view to index things in. The whole point of those views, I thought, was to be able to segregate the data. So if you want lemmas for a certain view to be separate from the lemmas for different views, you should be able to achieve that with a lemma index that is specific to that view. So you're agreeing with me, I think. (I'm the "> >" and the "> >> >" :) If you want to share lemmas from two views, share the index between the views. That's my mental model of how things should work. I like this better than adding sofa references for the following reasons: a) more space efficient, as there's not extra sofa references b) more time efficient, as you don't need to check the sofa references at indexing time c) no more complicated, as the user needs to reference something, the view or the sofa. This is how I would have done annotations as well. Maybe there are considerations that I'm not aware of, but I see no benefit to each annotation knowing what sofa it references. Well, I think the main reason we did this is so that we could implement Annotation.getCoveredText(). Also we have the use case where we're doing translation and we have Annotations in the translated text that point back to corresponding Annotations in the original text. So if you're walking the Annotation index in the translated text and follow references that get you to another Annotation, how are you supposed to know which Sofa the Annotation you're looking at is supposed to be annotating? To me, just looking at this from a data modeling perspective, the purpose of an Annotation is to indicate some span of text, so it makes sense to model it with a reference to that text. But I suppose other interpretations are possible. Of course that would make a view-less approach from the global CAS that much harder... Impossible, I think. We need to answer the question: are views a fundamental way of interacting with the CAS (*any* CAS implementation now or in the future, including raw XML manipulation) or not? The UIMA spec proposal says not, and there's at least one vocal proponent of that approach (Dan Gruhl). We of course could decide to become vocal proponents of the other approach, but it's not just ourselves we need to convince. -Adam
Re: CAS object change notifications
On 12/27/06, Thilo Goetz <[EMAIL PROTECTED]> wrote: Jörn Kottmann wrote: > Hi, > > it would be nice to have a facility to listen to changes made to a CAS > object. > > In a SWT/RCP application its likely that multiple views display > concurrently some aspects of the CAS object, > if now one view makes a change to the CAS object all other views must be > synchronized. > An example: The editor adds a new annotation to the CAS, now the outline > view must display the annotation too. > > What do you think about introducing change notification support ? > > Jörn It would be nice to have a generic CAS UI model (in the MVC sense) with that kind of support. On the other hand, I don't think the CAS itself is the appropriate place. How about an interface that extends the CAS interface and adds appropriate change notification support (along with an implementation, of course ;-). We could also use a "decorator" that wraps the CAS interface and forwards methods to the real CAS while also sending notifications. There could be some switch to tell the framework to apply the decorator (maybe part of the performance tuning properties we already have). Regardless of whether we do extension or decoration, do we have to do this not only for CAS but also JCas and all the related interfaces like FeatureStructure, FSIndexRepository, etc? Could get messy, but all in all I think finding some way to do this is worth it. This could also be used for keeping an "audit trail" of CAS changes for debugging or provenance tracking. -Adam
Re: [jira] Resolved: (UIMA-143) JCas and CAS interface and impl refactoring for more sharing
On 12/27/06, Thilo Goetz <[EMAIL PROTECTED]> wrote: Changes look ok, but I'm wondering: we now have CAS, CommonCas, AbstractCas and AbstractCas_ImplBase, all in org.apache.uima.cas. Why is AbstractCas_ImplBase in the interfaces package? Can we please move it to the appropriate impl package? Any chance we can consolidate AbstractCas and CommonCas? AbstractCas is meant to be the superinterface of any CAS interface we might ever have in UIMA - someday this might even be extensible by users. So we can't add the methods from CommonCas, which are specific to our particular CAS and JCas interface. The only methods on AbstractCas are those that the framework absolutely requires any CAS to implement. Possibly, the type name constants might be moved to AbstractCas, since those may be considered a core part of our CAS data, independent of the actual interface... but even this is not so clear to me. If we move closer to adopting Ecore/XMI we may have an interface that doesn't use the names like "uima.cas.Integer", but instead uses standard Ecore names. I think maybe we should leave AbstractCas alone for now. I'm OK with moving AbstractCas_ImplBase to the impl package. Either Marshall could do this while he's in there or I can do it later. The only argument I can think of for why I put I didn't put it under impl is that someday this might be extensible by users, and our other ImplBase classes that our extensible by users are also in the "interfaces" package. (It's not really a package intended for only interfaces, but a "public API" package.) However, even if this does become extensible the vast majority of users would not care, so I think it would be fine to put it in the impl package. -Adam
[jira] Assigned: (UIMA-10) Split JCas into interface and implementation
[ http://issues.apache.org/jira/browse/UIMA-10?page=all ] Adam Lally reassigned UIMA-10: -- Assignee: Marshall Schor (was: Adam Lally) > Split JCas into interface and implementation > > > Key: UIMA-10 > URL: http://issues.apache.org/jira/browse/UIMA-10 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework >Reporter: Adam Lally > Assigned To: Marshall Schor > Fix For: 2.1 > > > We should split the existing JCAS class into an interface > org.apache.uima.jcas.JCas and its implementation > org.apache.uima.jcas.impl.JCasImpl. This follows good design practices and > also is consistent with the rest of uimaj-core. It is important to do this > prior to our first release since it will be a non-compatible change for user > code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Reopened: (UIMA-10) Split JCas into interface and implementation
[ http://issues.apache.org/jira/browse/UIMA-10?page=all ] Adam Lally reopened UIMA-10: I've started reviewing this... so far I see several occurrences of JCasImpl in interface code - for example in JCasAnnotator_ImplBase, JCasMultiplier_ImplBase, and AnalysisEngine. I think anything outside of an impl package should always refer to the JCas interface, never the JCasImpl. > Split JCas into interface and implementation > > > Key: UIMA-10 > URL: http://issues.apache.org/jira/browse/UIMA-10 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework >Reporter: Adam Lally > Assigned To: Adam Lally > Fix For: 2.1 > > > We should split the existing JCAS class into an interface > org.apache.uima.jcas.JCas and its implementation > org.apache.uima.jcas.impl.JCasImpl. This follows good design practices and > also is consistent with the rest of uimaj-core. It is important to do this > prior to our first release since it will be a non-compatible change for user > code. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: adding missing methods to CAS / JCas interfaces: getView(String local_sofa_name) and getView(SofaFS)
On 12/26/06, Marshall Schor <[EMAIL PROTECTED]> wrote: It appears these some of these methods which are public in the CAS impl were missing in the interface, and missing in the interface/impl in the JCas. I'm adding them - if this is wrong, let me know :-) To me it looks like these were already present on the CAS interface. getView(String) is the preferred way of switching views since 2.0. +1 to adding the corresponding methods on the JCas interface. -Adam
Re: Refactoring / cleanup : getting rid of sofa2jcasMap
On 12/26/06, Marshall Schor <[EMAIL PROTECTED]> wrote: CASImpl has a hashmap sofa2jcasMap. I think it is superfluous. It is used in the method getJCas(SofaFS). But I think that method can be replaced with a new definition that doesn't use this map: getView(SofaFS).getJCas(); Is there a reason we want to have this other hash map? No reason I can think of. +1 to remove it. -Adam
Re: svn commit: r490686 - /incubator/uima/uimaj/trunk/uima-docbooks/build.xml
Hi Thilo - I could never figure out what the purpose of the name attribute on the tag was. Is this just for some informal user documentation, or is it used somewhere? When I created the docbook project, I started with the Jakarta Velocity docbook package. This property seemed to be unused, and it was "wrong" (it was left over from the Jakarta Velocity project) and so it seemed to be something that took "maintenance" without any benefit... so I had removed it. -Marshall [EMAIL PROTECTED] wrote: Author: twgoetz Date: Thu Dec 28 01:40:45 2006 New Revision: 490686 URL: http://svn.apache.org/viewvc?view=rev&rev=490686 Log: Minor (no JIRA): add name attribute to project declaration in build.xml of uimaj-docbook. Modified: incubator/uima/uimaj/trunk/uima-docbooks/build.xml Modified: incubator/uima/uimaj/trunk/uima-docbooks/build.xml URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/build.xml?view=diff&rev=490686&r1=490685&r2=490686 == --- incubator/uima/uimaj/trunk/uima-docbooks/build.xml (original) +++ incubator/uima/uimaj/trunk/uima-docbooks/build.xml Thu Dec 28 01:40:45 2006 @@ -19,7 +19,7 @@ --> - +
Re: Question about DTD in DocBook files
Thilo Goetz wrote: I'm starting to port the CVD documentation to DocBook. For my XML editing, I use the Eclipse WebTools, which can give me great syntax support -- if it can find the DTD. This works when I change the DTD address from http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd to ../../../docbook/docbook-xml-4.4/docbookx.dtd Even if I could make the web URL work, I would still prefer the local file, as I may work on this when I'm not connected to the network. Anybody see any problem with that? Might be OK, but all these kinds of issues are "supposed" to be handled by XML Catalogs. These map official DTD names to local files. "DocBook XSL: The Complete Guide" has a whole chapter on this, which is available online: http://www.sagehill.net/docbookxsl/Catalogs.html I use XMLBuddy in Eclipse, which has all kinds of Eclipse-style DTD provided help (such as auto-completion, showing you only "valid" choices of tags, attributes, attribute-values, etc., all based on the DTDs). It seems to work fine with the XML Catalogs. (Note: these Catalogs are part of the uima-docbooks project). -Marshall\
Question about DTD in DocBook files
I'm starting to port the CVD documentation to DocBook. For my XML editing, I use the Eclipse WebTools, which can give me great syntax support -- if it can find the DTD. This works when I change the DTD address from http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd to ../../../docbook/docbook-xml-4.4/docbookx.dtd Even if I could make the web URL work, I would still prefer the local file, as I may work on this when I'm not connected to the network. Anybody see any problem with that? Marshall? Thanks, Thilo
[jira] Created: (UIMA-145) Port CVD documentation to DocBook
Port CVD documentation to DocBook - Key: UIMA-145 URL: http://issues.apache.org/jira/browse/UIMA-145 Project: UIMA Issue Type: Improvement Components: Documentation Reporter: Thilo Goetz Assigned To: Thilo Goetz Priority: Minor -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira