Re: suggestion for default pipelines

2014-04-28 Thread Chen, Pei
Yes. I was thinking of the use case for example- the ytex component need 
SentenceDectectorA but dictionary lookup component expects SentenceDectectorB. 
It's probably not too common but something to consider with the cool 
dynamic/plugin n play pipelines idea. 

Sent from my iPhone

 On Apr 28, 2014, at 5:46 AM, Richard Eckart de Castilho r...@apache.org 
 wrote:
 
 At the time a factory method becomes callable, the Maven/Ivy-magic should 
 already have taken place, no?
 
 -- Richard
 
 On 27.04.2014, at 17:52, Chen, Pei pei.c...@childrens.harvard.edu wrote:
 
 My vote would be for the latter. Have the Factory create pipelines 
 instead. It could just be a naming thing though...
 
 +1 for building dynamic pipelines. I think this idea has been thrown around 
 for sometime, but it hasn't been really worked on so it would be cool to see 
 it in action. I think the tricky part is handling pipeline dependencies- ie. 
 Similar concept to Maven/Ivy. 
 
 Sent from my iPhone
 
 On Apr 24, 2014, at 5:48 PM, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 Any preference for separate factory classes:
 
 class SentenceDetectorAnnotatorFactory:
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 VS
 
 static methods added to primitive annotators:
 
 class SentenceDetector (existing)
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 ?
 
 The former can clutter up the class space while the latter extends the
 length of classes, especially if there are multiple versions
 (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
 getMeshDictionaryAnnotator(), etc.)
 
 Tim
 
 On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
 It would be nice if uimaFIT provided a Maven plugin to automatically
 generate descriptors for aggregates. Maybe if we come up with a 
 convention for factories, e.g. a class with static methods that do
 not take any parameters and that return descriptors, or methods
 that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)
 it should be possible to implement such a Maven plugin.
 
 Cheers,
 
 -- Richard
 
 On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote:
 
 +1. And note that once you have a descriptor, you can generate the
 XML, so we should arrange to replace the current XML descriptors with
 ones generated automatically from the uimaFIT code. That should reduce
 some synchronization problems when the Java code was changed but the
 XML descriptor was not.
 
 Steve
 
 On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 The discussion in the other thread with Abraham Tom gave me an idea I
 wanted to float to the list. We have been using some UIMAFit pipeline
 builders in the temporal project that maybe could be moved into
 clinical-pipeline. For example, look to this file:
 
 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
 
 with the static methods getPreprocessorAggregateBuilder() and
 getLightweightPreprocessorAggregateBuilder()   [no umls].
 
 So my idea would be to create a class in clinical-pipeline
 (CTakesPipelines) with static methods for some standard pipelines (to
 return AnalysisEngineDescriptions instead of AggregateBuilders?):
 
 getStandardUMLSPipeline()  -- builds pipeline currently in
 AggregatePlaintextUMLSProcessor.xml
 getFullPipeline() -- same as above but with SRL, constituency parsing,
 etc., every component in ctakes
 
 We could then potentially merge our entry points -- I think Abraham's
 experience points out that this is currently confusing, as well as
 probably not implemented optimally. For example, either
 ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
 method to run a uimafit-style pipeline. Maybe we can slowly deprecate
 our xml descriptors too unless people feel strongly about keeping those
 around.
 
 Another benefit is that the cTAKES API is then trivial -- if you import
 ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
 
 builder.add(CTAKESPipelines.getStandardUMLSPipeline());
 
 
 I think this would actually be pretty easy to implement, but hoping to
 get some feedback on whether this is a good direction.
 
 Tim
 
 -- 
 Tim Miller
 Instructor
 Boston Children's Hospital and Harvard Medical School
 timothy.mil...@childrens.harvard.edu
 617-919-1223
 


Re: suggestion for default pipelines

2014-04-27 Thread Chen, Pei
My vote would be for the latter. Have the Factory create pipelines instead. 
It could just be a naming thing though...

+1 for building dynamic pipelines. I think this idea has been thrown around for 
sometime, but it hasn't been really worked on so it would be cool to see it in 
action. I think the tricky part is handling pipeline dependencies- ie. Similar 
concept to Maven/Ivy. 

Sent from my iPhone

 On Apr 24, 2014, at 5:48 PM, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 Any preference for separate factory classes:
 
 class SentenceDetectorAnnotatorFactory:
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 VS
 
 static methods added to primitive annotators:
 
 class SentenceDetector (existing)
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 ?
 
 The former can clutter up the class space while the latter extends the
 length of classes, especially if there are multiple versions
 (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
 getMeshDictionaryAnnotator(), etc.)
 
 Tim
 
 On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
 It would be nice if uimaFIT provided a Maven plugin to automatically
 generate descriptors for aggregates. Maybe if we come up with a 
 convention for factories, e.g. a class with static methods that do
 not take any parameters and that return descriptors, or methods
 that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)
 it should be possible to implement such a Maven plugin.
 
 Cheers,
 
 -- Richard
 
 On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote:
 
 +1. And note that once you have a descriptor, you can generate the
 XML, so we should arrange to replace the current XML descriptors with
 ones generated automatically from the uimaFIT code. That should reduce
 some synchronization problems when the Java code was changed but the
 XML descriptor was not.
 
 Steve
 
 On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 The discussion in the other thread with Abraham Tom gave me an idea I
 wanted to float to the list. We have been using some UIMAFit pipeline
 builders in the temporal project that maybe could be moved into
 clinical-pipeline. For example, look to this file:
 
 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
 
 with the static methods getPreprocessorAggregateBuilder() and
 getLightweightPreprocessorAggregateBuilder()   [no umls].
 
 So my idea would be to create a class in clinical-pipeline
 (CTakesPipelines) with static methods for some standard pipelines (to
 return AnalysisEngineDescriptions instead of AggregateBuilders?):
 
 getStandardUMLSPipeline()  -- builds pipeline currently in
 AggregatePlaintextUMLSProcessor.xml
 getFullPipeline() -- same as above but with SRL, constituency parsing,
 etc., every component in ctakes
 
 We could then potentially merge our entry points -- I think Abraham's
 experience points out that this is currently confusing, as well as
 probably not implemented optimally. For example, either
 ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
 method to run a uimafit-style pipeline. Maybe we can slowly deprecate
 our xml descriptors too unless people feel strongly about keeping those
 around.
 
 Another benefit is that the cTAKES API is then trivial -- if you import
 ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
 
 builder.add(CTAKESPipelines.getStandardUMLSPipeline());
 
 
 I think this would actually be pretty easy to implement, but hoping to
 get some feedback on whether this is a good direction.
 
 Tim
 
 -- 
 Tim Miller
 Instructor
 Boston Children's Hospital and Harvard Medical School
 timothy.mil...@childrens.harvard.edu
 617-919-1223
 


Re: suggestion for default pipelines

2014-04-16 Thread Richard Eckart de Castilho
It would be nice if uimaFIT provided a Maven plugin to automatically
generate descriptors for aggregates. Maybe if we come up with a 
convention for factories, e.g. a class with static methods that do
not take any parameters and that return descriptors, or methods
that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)
it should be possible to implement such a Maven plugin.

Cheers,

-- Richard

On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote:

 +1. And note that once you have a descriptor, you can generate the
 XML, so we should arrange to replace the current XML descriptors with
 ones generated automatically from the uimaFIT code. That should reduce
 some synchronization problems when the Java code was changed but the
 XML descriptor was not.
 
 Steve
 
 On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 The discussion in the other thread with Abraham Tom gave me an idea I
 wanted to float to the list. We have been using some UIMAFit pipeline
 builders in the temporal project that maybe could be moved into
 clinical-pipeline. For example, look to this file:
 
 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
 
 with the static methods getPreprocessorAggregateBuilder() and
 getLightweightPreprocessorAggregateBuilder()   [no umls].
 
 So my idea would be to create a class in clinical-pipeline
 (CTakesPipelines) with static methods for some standard pipelines (to
 return AnalysisEngineDescriptions instead of AggregateBuilders?):
 
 getStandardUMLSPipeline()  -- builds pipeline currently in
 AggregatePlaintextUMLSProcessor.xml
 getFullPipeline() -- same as above but with SRL, constituency parsing,
 etc., every component in ctakes
 
 We could then potentially merge our entry points -- I think Abraham's
 experience points out that this is currently confusing, as well as
 probably not implemented optimally. For example, either
 ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
 method to run a uimafit-style pipeline. Maybe we can slowly deprecate
 our xml descriptors too unless people feel strongly about keeping those
 around.
 
 Another benefit is that the cTAKES API is then trivial -- if you import
 ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
 
 builder.add(CTAKESPipelines.getStandardUMLSPipeline());
 
 
 I think this would actually be pretty easy to implement, but hoping to
 get some feedback on whether this is a good direction.
 
 Tim



Re: suggestion for default pipelines

2014-04-15 Thread Steven Bethard
+1. And note that once you have a descriptor, you can generate the
XML, so we should arrange to replace the current XML descriptors with
ones generated automatically from the uimaFIT code. That should reduce
some synchronization problems when the Java code was changed but the
XML descriptor was not.

Steve

On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
timothy.mil...@childrens.harvard.edu wrote:
 The discussion in the other thread with Abraham Tom gave me an idea I
 wanted to float to the list. We have been using some UIMAFit pipeline
 builders in the temporal project that maybe could be moved into
 clinical-pipeline. For example, look to this file:

 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

 with the static methods getPreprocessorAggregateBuilder() and
 getLightweightPreprocessorAggregateBuilder()   [no umls].

 So my idea would be to create a class in clinical-pipeline
 (CTakesPipelines) with static methods for some standard pipelines (to
 return AnalysisEngineDescriptions instead of AggregateBuilders?):

 getStandardUMLSPipeline()  -- builds pipeline currently in
 AggregatePlaintextUMLSProcessor.xml
 getFullPipeline() -- same as above but with SRL, constituency parsing,
 etc., every component in ctakes

 We could then potentially merge our entry points -- I think Abraham's
 experience points out that this is currently confusing, as well as
 probably not implemented optimally. For example, either
 ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
 method to run a uimafit-style pipeline. Maybe we can slowly deprecate
 our xml descriptors too unless people feel strongly about keeping those
 around.

 Another benefit is that the cTAKES API is then trivial -- if you import
 ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

 builder.add(CTAKESPipelines.getStandardUMLSPipeline());


 I think this would actually be pretty easy to implement, but hoping to
 get some feedback on whether this is a good direction.

 Tim