Removing descriptor files from ClearTK

Philip Ogren Thu, 04 Jun 2009 09:07:16 -0700

I would like to weigh in on the recent discussion (previously titled"Parameters in uima descriptors") w.r.t. our thinking about descriptorfiles in our UIMA project, ClearTK. The last time we got together wedecided that we were going to move away from providing descriptor filesfor our project and move towards providing static factory methods forcreating *Description objects (e.g. AnalysisEngineDescription). If youcheck out the code and look at it now - you will see that there arestill descriptor files scattered throughout our code and that we havestarted adding these factory methods - but that realizing this goal isstill in progress. (see http://cleartk.googlecode.com) These methodswill serve two purposes - 1) allow users to directly instantiate ourcomponents in Java and 2) to guide users in how to write descriptorfiles for our components. While we understand the purpose and necessityof descriptor files, we are not going to provide them for the followingreasons:

1) maintaining descriptor files is a giant pain in the butt. Thedevelopers of ClearTK are two graduate students and a postdoc and we donot have the resources (or patience) to maintain these files. We havefound that as we have evolved and refactored our code that ourdescriptor files are constantly breaking and are absurdly burdensome tomaintain. I don't want to call out others in this conversation (pleasechime in as you will!) but I have had a number of conversations withdevelopers on several other UIMA projects and I am not alone in myloathing of maintaining descriptor files. The maintainance isparticularly burdensome for descriptor files that you might create foryour unit tests. They are constantly breaking, they are tedious to fix,and they discourage code refactoring and evolution by their merepresence (let me tell you how I really feel!)

2) We cannot create all possible descriptor files that might be neededto use ClearTK in the ways desired by the user. Our library reliesheavily on dynamic class loading driven by class names provided inconfiguration parameters. For example, when you are writing trainingdata for a particular machine learning classifier you can specify theclass name of the data writer to be used (e.g. one for maxent orlibsvm). These data writers may require additional configurationparameters that must be set in the descriptor file. Therefore, whatends up in the descriptor file is determined by a specific use-case andis not constrained to a fixed set of configuration parameters.It is our goal to make it easy for users of ClearTK to be able to makedescriptor files that are specific to a users use-case/scenario by 1)creating factory methods that demonstrate common ways that ourcomponents can be described (i.e. the user can study these methods whenwriting their own descriptor file) and 2) by naming our configurationparameters according to a strict naming convention which points the userto the canonical definition and documentation for a configurationparameter (e.g."org.cleartk.classifier.InstanceConsumer.PARAM_ANNOTATION_HANDLER") and3) by providing documentation on how to do this.


Here are a few more points that I want to make:

- we are not ruling out providing some descriptor files - esp. forconfigurations that we think/hope will be useful to have for runningsome of our components "out-of-the-box". Much of our code is intendedas a framework for users of ClearTK to create their own components usingcommon machine learning approaches. While, we could anticipate thegeneral structure of descriptor files for such user-generated components(and we have tried) we have decided that descriptors such as these inparticular will no longer be provided in ClearTK.- we are not getting rid of type system descriptor files.- we are not saying that descriptor files are of no use. They areclearly very nice to have for sharing and for deployment. I do not usethem myself for setting up and running experiments in my research and wefeel that for the above reasons we do not have to provide them.


I hope this clarifies the discussion.

Philip

Removing descriptor files from ClearTK

Reply via email to