Hi Mark, I don't this is naive at all. I'll describe a few areas where UIMA adds value (this is not a complete list :-) ).
UIMA requires that the components declare (in external .XML files) things like "types" and what types they produce and require as inputs. This facilitates different "roles" played by different people with different skills. For instance, an annotator writer (someone who writes components) might be skilled in NLP algorithms and how to write them efficiently. Someone else might not know these details, but be better at "assembling" and "configuring" components, perhaps written by different people independently, to address a particular need. This person would use tooling that makes use of the external .XML files mentioned above, to do this. The Component Description Editor tool that comes with UIMA is an example of this kind of a tool. Another kind of role is facilitated by UIMA-AS - the "deployer" might have a configured set of components that is running too slowly - and after some analysis of the time spent in various parts (facilitated perhaps by using the "framework" facilities that compute this), might decide to "scale-out" certain steps / components that are the bottleneck, running those on multiple machines in a cluster. The original component writer knows nothing about these various kinds of possibilities - the framework is "adding value" by keeping these concerns separated from each other, and providing "services" that address them. The first a-ha experience I had with the framework happened when (in a class) we were running some sets of components, and then with a few simple commands were able to have some people in the class run a part of the pipeline as a "service" on their machine, and others in the class used those components, without re-writing the components, and without doing any special coding to make this happen (other than changing some XML configuration files). The framework did the work of using current networking technology to accomplish all this, and the algorithm writer (who might have been a scientific programmer that wasn't keeping up with the rapid changes in the world of networking components) didn't need to know any of the details of this, to take advantage of it. -Marshall Mark Ettinger wrote: > Hello all, > > I am a trained mathematician/computer scientist/programmer jumping into NLP > and > excited by the challenge but intimidated by the algorithm and software > options. > Specifically, I am at University of Texas and am charged with putting to good > use our large database of (more-or-less unused) clinical notes. My strategy > is > roughly: > > 1. Learn the theory of NLP and Information Extraction. > 2. Understand the publicly available software packages so as to avoid > reinventing the wheel. > 3. Apply #2 to our database and begin experimenting. > > My question in this post centers on #2. Not being a software engineer (though > having lots of scientific programming experience), I am sometimes puzzled by > "frameworks" and "components". I think of everything as libraries of > functions. > Yes, I know this view is outdated. I can wrap my head around NLP packages > like > Lingpipe and NLTK but am unclear what a package like UIMA offers over and > above > these types of pure libraries. > > Given what I've told you about my background (scientist, programmer, but NOT > software engineer) can someone explain to me how investing the time to learn > UIMA will pay off in the long run? I've started to dig into the UIMA api but > thought I'd throw this rather basic question out there, hoping someone > wouldn't > think it too naive for this forum. > > Thanks in advance! > > Mark Ettinger > > > >
