Re: suggestion for default pipelines
Yes. I was thinking of the use case for example- the ytex component need SentenceDectectorA but dictionary lookup component expects SentenceDectectorB. It's probably not too common but something to consider with the cool dynamic/plugin n play pipelines idea. Sent from my iPhone On Apr 28, 2014, at 5:46 AM, Richard Eckart de Castilho r...@apache.org wrote: At the time a factory method becomes callable, the Maven/Ivy-magic should already have taken place, no? -- Richard On 27.04.2014, at 17:52, Chen, Pei pei.c...@childrens.harvard.edu wrote: My vote would be for the latter. Have the Factory create pipelines instead. It could just be a naming thing though... +1 for building dynamic pipelines. I think this idea has been thrown around for sometime, but it hasn't been really worked on so it would be cool to see it in action. I think the tricky part is handling pipeline dependencies- ie. Similar concept to Maven/Ivy. Sent from my iPhone On Apr 24, 2014, at 5:48 PM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: Any preference for separate factory classes: class SentenceDetectorAnnotatorFactory: static AnalysisEngineDescription getSentenceDetectorAnnotator() VS static methods added to primitive annotators: class SentenceDetector (existing) static AnalysisEngineDescription getSentenceDetectorAnnotator() ? The former can clutter up the class space while the latter extends the length of classes, especially if there are multiple versions (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(), getMeshDictionaryAnnotator(), etc.) Tim On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote: It would be nice if uimaFIT provided a Maven plugin to automatically generate descriptors for aggregates. Maybe if we come up with a convention for factories, e.g. a class with static methods that do not take any parameters and that return descriptors, or methods that bear a specific Java annotation, e.g. @AutoGenerateDescriptor) it should be possible to implement such a Maven plugin. Cheers, -- Richard On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote: +1. And note that once you have a descriptor, you can generate the XML, so we should arrange to replace the current XML descriptors with ones generated automatically from the uimaFIT code. That should reduce some synchronization problems when the Java code was changed but the XML descriptor was not. Steve On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the list. We have been using some UIMAFit pipeline builders in the temporal project that maybe could be moved into clinical-pipeline. For example, look to this file: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup with the static methods getPreprocessorAggregateBuilder() and getLightweightPreprocessorAggregateBuilder() [no umls]. So my idea would be to create a class in clinical-pipeline (CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions instead of AggregateBuilders?): getStandardUMLSPipeline() -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component in ctakes We could then potentially merge our entry points -- I think Abraham's experience points out that this is currently confusing, as well as probably not implemented optimally. For example, either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people feel strongly about keeping those around. Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom file getting a UIMA pipeline is one UimaFit call: builder.add(CTAKESPipelines.getStandardUMLSPipeline()); I think this would actually be pretty easy to implement, but hoping to get some feedback on whether this is a good direction. Tim -- Tim Miller Instructor Boston Children's Hospital and Harvard Medical School timothy.mil...@childrens.harvard.edu 617-919-1223
Re: suggestion for default pipelines
My vote would be for the latter. Have the Factory create pipelines instead. It could just be a naming thing though... +1 for building dynamic pipelines. I think this idea has been thrown around for sometime, but it hasn't been really worked on so it would be cool to see it in action. I think the tricky part is handling pipeline dependencies- ie. Similar concept to Maven/Ivy. Sent from my iPhone On Apr 24, 2014, at 5:48 PM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: Any preference for separate factory classes: class SentenceDetectorAnnotatorFactory: static AnalysisEngineDescription getSentenceDetectorAnnotator() VS static methods added to primitive annotators: class SentenceDetector (existing) static AnalysisEngineDescription getSentenceDetectorAnnotator() ? The former can clutter up the class space while the latter extends the length of classes, especially if there are multiple versions (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(), getMeshDictionaryAnnotator(), etc.) Tim On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote: It would be nice if uimaFIT provided a Maven plugin to automatically generate descriptors for aggregates. Maybe if we come up with a convention for factories, e.g. a class with static methods that do not take any parameters and that return descriptors, or methods that bear a specific Java annotation, e.g. @AutoGenerateDescriptor) it should be possible to implement such a Maven plugin. Cheers, -- Richard On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote: +1. And note that once you have a descriptor, you can generate the XML, so we should arrange to replace the current XML descriptors with ones generated automatically from the uimaFIT code. That should reduce some synchronization problems when the Java code was changed but the XML descriptor was not. Steve On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the list. We have been using some UIMAFit pipeline builders in the temporal project that maybe could be moved into clinical-pipeline. For example, look to this file: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup with the static methods getPreprocessorAggregateBuilder() and getLightweightPreprocessorAggregateBuilder() [no umls]. So my idea would be to create a class in clinical-pipeline (CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions instead of AggregateBuilders?): getStandardUMLSPipeline() -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component in ctakes We could then potentially merge our entry points -- I think Abraham's experience points out that this is currently confusing, as well as probably not implemented optimally. For example, either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people feel strongly about keeping those around. Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom file getting a UIMA pipeline is one UimaFit call: builder.add(CTAKESPipelines.getStandardUMLSPipeline()); I think this would actually be pretty easy to implement, but hoping to get some feedback on whether this is a good direction. Tim -- Tim Miller Instructor Boston Children's Hospital and Harvard Medical School timothy.mil...@childrens.harvard.edu 617-919-1223
Re: suggestion for default pipelines
It would be nice if uimaFIT provided a Maven plugin to automatically generate descriptors for aggregates. Maybe if we come up with a convention for factories, e.g. a class with static methods that do not take any parameters and that return descriptors, or methods that bear a specific Java annotation, e.g. @AutoGenerateDescriptor) it should be possible to implement such a Maven plugin. Cheers, -- Richard On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote: +1. And note that once you have a descriptor, you can generate the XML, so we should arrange to replace the current XML descriptors with ones generated automatically from the uimaFIT code. That should reduce some synchronization problems when the Java code was changed but the XML descriptor was not. Steve On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the list. We have been using some UIMAFit pipeline builders in the temporal project that maybe could be moved into clinical-pipeline. For example, look to this file: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup with the static methods getPreprocessorAggregateBuilder() and getLightweightPreprocessorAggregateBuilder() [no umls]. So my idea would be to create a class in clinical-pipeline (CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions instead of AggregateBuilders?): getStandardUMLSPipeline() -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component in ctakes We could then potentially merge our entry points -- I think Abraham's experience points out that this is currently confusing, as well as probably not implemented optimally. For example, either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people feel strongly about keeping those around. Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom file getting a UIMA pipeline is one UimaFit call: builder.add(CTAKESPipelines.getStandardUMLSPipeline()); I think this would actually be pretty easy to implement, but hoping to get some feedback on whether this is a good direction. Tim
Re: suggestion for default pipelines
+1. And note that once you have a descriptor, you can generate the XML, so we should arrange to replace the current XML descriptors with ones generated automatically from the uimaFIT code. That should reduce some synchronization problems when the Java code was changed but the XML descriptor was not. Steve On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the list. We have been using some UIMAFit pipeline builders in the temporal project that maybe could be moved into clinical-pipeline. For example, look to this file: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup with the static methods getPreprocessorAggregateBuilder() and getLightweightPreprocessorAggregateBuilder() [no umls]. So my idea would be to create a class in clinical-pipeline (CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions instead of AggregateBuilders?): getStandardUMLSPipeline() -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component in ctakes We could then potentially merge our entry points -- I think Abraham's experience points out that this is currently confusing, as well as probably not implemented optimally. For example, either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people feel strongly about keeping those around. Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom file getting a UIMA pipeline is one UimaFit call: builder.add(CTAKESPipelines.getStandardUMLSPipeline()); I think this would actually be pretty easy to implement, but hoping to get some feedback on whether this is a good direction. Tim