Hi Marshall, Sorry for the delayed response. I've been on vacation for a week....
This is extremely helpful, and I thank you for taking the time to explain it in such depth.
I'll let you know if I run into any more roadblocks. Andrew At 09:15 AM 8/30/2007, you wrote:
Hi Andrew and everyone reading this thread - There are some misconceptions, I think, in this thread, caused by some imprecise language in our Javadocs, perhaps. The way external resources are shared among annotators is that the XML descriptors specify an "interface" and an "implementation". Multiple external resource dependencies can be specified to share the same implementation. Aggregates can override the implementation specified in their delegates. One instance of the implementation class is created by the framework; this instance is "shared" by all annotators running in the same JVM. This shared implementation can by user-written, and can do anything it wants. For instance, it could keep an in-memory copy of some data and make it available to all the annotators sharing this; the design of both the interface and the concrete implementation class is up to you. That being said, the framework supplies some example interfaces / implementations for this, one of which is "DataResource". If you look at the implementation of DataResource - you can see that the thought behind the Javadoc comment "if you directly access the resource, the benefits of the ResourceManager (caching and sharing) are lost" is perhaps misleading. The "caching" being contemplated here was to read a remote (assume slow to access) file and write it out in the local file system, to be accessed more quickly. However ***this is not implemented***. The impl code (which you can see on-line if you don't want to download the source, it is here for release 2.2: http://svn.apache.org/viewvc/incubator/uima/uimaj/tags/uimaj-2.2.0/uimaj-2.2.0-incubating/uimaj-core/src/main/java/org/apache/uima/resource/impl/DataResource_impl.java?view=markup ) says /** * A simple [EMAIL PROTECTED] DataResource} implementation that can read data from a file via a URL. There is * an attribute for specifying the location of a local cache for a remote file, but this is not * currently being used. */ The getInputStream() method of this, when called by different annotators sharing the same instance, will return a new, unshared input stream, each time this is called. So - I think this resource is probably not what you want. You might want to implement your own resource, to do exactly the kind of sharing you want. If you do, please keep in mind the different possible deployment alternatives that others using your components may set up. For instance, if they deploy things with some scale-out where multiple instances are running concurrently in the same JVM, then you will need to insure that your implementation is thread-safe, and follows the rules for Java memory model. This will involve using "synchronized" or "volatile" keywords, for instance, in appropriate spots. Furthermore, if your components could be deployed in some arrangement where some of them are running on different JVMs (perhaps scaled out across multiple hosts, for instance), then to actually share data, you'll need to use the same techniques used in web servers that do this - for instance, putting shared data into a database, and having all the parts access that database. I hope this is helpful, but please let me know if I've misunderstood the questions... -Marshall Andrew Shirk wrote: > Hi Michael, > > Yes, that's the approach I started with, but the DataResource javadoc > indicates that if you directly access the resource, the benefits of > the ResourceManager (caching and sharing) are lost. Furthermore, if > in my SharedResourceObject implementation I make modifications to the > resource, then it will be out of sync with the ResourceManager's > cache. The next annotator very well may get the stale version of the > resource. > > Thilo, I'm afraid that's the approach I may end up having to use, but > it's really a kludge. > > Is there no global variable space, outside of the CAS, for the entire > aggregate? If there were, that would be the best solution I think... > > Thanks for the suggestions. > > Andrew > > At 11:27 AM 8/29/2007, you wrote: >> Another possibility are external resources. When defining external >> resources one or more annotators can share the same resource. >> The UIMA framework take care of the resource's life cycle. >> You will find some documentation about external resources in the UIMA >> reference guide at 2.4.1.10. External Resource Dependencies. >> You can also check the UIMA examples - tutorial ex6 use external >> resources. (apache-uima/examples/descriptors/tutorial/ex6) >> >> -- Michael >> >> Thilo Goetz wrote: >>> If this happens often, one idea might be just to >>> stick the information in the CAS. That way you >>> can even run several instances of this pipeline >>> and it will still work ;-) Of course you're not >>> persisting the info that way, not sure if this is >>> a requirement or not. >>> >>> --Thilo >>> >>> Andrew Shirk wrote: >>> >>>> What is the best practice for sharing read/write resources amongst >>>> analysis engines in an aggregate? For example, say you have an >>>> annotator >>>> early in a flow that reads a configuration file off disk in order >>>> determine its behavior. Then, the next annotator does something, and >>>> needs to write changes to the configuration file so that another >>>> annotator downstream, whose behavior is also determined by the >>>> contents >>>> of the configuration file, can read in the resource that contains the >>>> changes. >>>> >>>> Does this make sense? >>>> >>>> Any help or ideas would be appreciated. I can think of some ugly >>>> hacks, >>>> but it would be nice to know if I'm missing some portion of the API >>>> that >>>> supports this type of scenario. >>>> >>>> Thanks, Andrew >>>> > > >
