Thanks peter I will implement it an submit a patch.
Best El Vie 09/01/2015, 15:51, Peter Klügl <[email protected]> escribió: > As for reusing the tokenization, may we should add something like this > logic (reusing as default): > > for each seeder > if there are annotations of my seeding types > if new config param is true // for partial or corrupt tokenizations > remove all seeding annotations and generate them anew > else > do nothing > else // no tokenization yet > generate seeding annotations > > Best, > > Peter > > > Am 09.01.2015 um 15:44 schrieb Peter Klügl: > > Hi, > > > > Am 09.01.2015 um 15:28 schrieb Silvestre Losada: > >> Hi Peter > >> > >> I missed this email. I see your point about the analysis engines > changing > >> arbitrary the annotations, however that fact can occur now, if a script > >> uses EXEC action to execute external analysis engine, I think that an > extra > >> parameter could be added to ruta to specify if ruta tokenization, > >> RutaAnnotations and RutaStream can be reused. I think that it may be > >> possible to reuse ruta tokenization (annotations stream) across same > Cas. > > Yes, this should be possible, or let me say it this way: the > > tokenization of one seeder should be reused at any case. Other scripts > > may apply additional seeder, but that won't probably not be the common > > case. Reusing RutaStream will be complicated, especially for > > multi-view/cas-multiplier pipelines. I think the best way is to share > > and update the RutaBasics. > > > > There are many options to improve the performance when applying several > > analysis engines in a normal UIMA pipeline. Especially the internal > > indexing should be improved. The main reason why these improvements are > > not yet implemented can probably be found in our use cases (no parallel > > execution, applying one complex script, no need for high performance). > > > > I am open for all improvements. In my opinion, we should create a test > > pipeline as a unit test and then optimize all aspects. > > > > Best, > > > > Peter > > > > > >> Best Silvestre. > >> > >> On 31 December 2014 at 13:31, Peter Klügl <[email protected]> > wrote: > >> > >>> Am 29.12.2014 um 16:24 schrieb Silvestre Losada: > >>> > >>>> Thanks for your answer, I was working in this way and seems to be best > >>>> approach. The problem here is that I need to setup several > RutaEngines in > >>>> the pipe, it would be nice if RutaStream or at least ruta annotations > >>>> generated can be reused from one RutaEngine to another RutaEngine in > same > >>>> pipe, to avoid duplicated information. If you wish I can implement it > and > >>>> submit a patch to you. > >>>> > >>> Oh yes, this causes a real slowdown when applying several scripts > within a > >>> pipeline. All help is welcome :-) > >>> > >>> The main problem is that ruta requires additional indexing information > for > >>> conditions like PARTOF (which otherwise would be terribly slow). I > don't > >>> think that reusing the RutaStream would help because there could be an > >>> arbitrary analysis engine changing arbitrary annotations. The RutaBasic > >>> annotations are already reused to some extend, but the indexing is done > >>> again. My first guess would be that we add another configuration > parameter > >>> with a list of all types that analysis engines applied after the last > ruta > >>> engine may have changed. Some helper methods could set these values > >>> automatically given a pipeline. We could also use the capabilities of > the > >>> engines, but I am not sure that they are always correctly set. > >>> > >>> What do you think? > >>> > >>> Best, > >>> > >>> Peter > >>> > >>> > >>> > >>>> Kind regards. > >>>> > >>>> On 19 December 2014 at 17:54, Peter Klügl <[email protected]> > >>>> wrote: > >>>> > >>>> Am 19.12.2014 15:10, schrieb Silvestre Losada: > >>>>>> Hi Jens, > >>>>>> > >>>>>> First of all thanks for your detailed answer. UIMA ruta has an > option in > >>>>>> order to execute an analisys engine from ruta script here > >>>>>> <http://goo.gl/ekbhv8> is described. So inside the script you can > >>>>>> > >>>>> execute > >>>>> > >>>>>> the analysis engine and then apply some rules to the annotations > created > >>>>>> > >>>>> by > >>>>> > >>>>>> the analysis engine. What I want is to have the option to execute > the > >>>>>> analysis engines in parallel to save time. Would it be possible? > >>>>>> > >>>>> That's not possible in that way that you use more or other processes > for > >>>>> the contained analysis engine than for the ruta script. The analysis > >>>>> engine and the rules can be parallelized together as one analysis > engine > >>>>> namely that one of the script. > >>>>> > >>>>> You should probably extract the analysis engine into a pipeline, > which > >>>>> applies the analysis engine and then the script (resp. its analysis > >>>>> engine). Then, the normal UIMA-AS setting applies. > >>>>> > >>>>> Best, > >>>>> > >>>>> Peter > >>>>> > >>>>> > >>>>> Kind regards > >>>>>> On 19 December 2014 at 12:35, Jens Grivolla <[email protected]> > wrote: > >>>>>> > >>>>>>> Hi Silvestre, > >>>>>>> > >>>>>>> there doesn't seem to be anything RUTA-specific in your question. > In > >>>>>>> principle, UIMA-AS allows parallel scaleout and merges the results > >>>>>>> > >>>>>> (though > >>>>>> I personally have never used it this way), but there are of course > a few > >>>>>>> things to take into account. > >>>>>>> > >>>>>>> First, you will of course need to properly define the dependencies > >>>>>>> > >>>>>> between > >>>>>> your different analysis engines to ensure you always have all then > >>>>>>> necessary information available, meaning that you can only run > things > >>>>>>> in > >>>>>>> parallel that are independent of one another. And then you will > have to > >>>>>>> > >>>>>> see > >>>>>> if the overhead from distributing your CAS to several engines > running in > >>>>>>> parallel and then merging the results is not greater than just > having > >>>>>>> > >>>>>> it in > >>>>>> one colocated pipeline that can pass the information more > efficiently. I > >>>>>>> guess you'll have to benchmark your specific application, but maybe > >>>>>>> somebody with more experience can give you some general > directions... > >>>>>>> > >>>>>>> Best, > >>>>>>> Jens > >>>>>>> > >>>>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada < > >>>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>>> Well let me explain. > >>>>>>>> > >>>>>>>> Ruta scripts are really good to work over output of analysis > engines, > >>>>>>>> > >>>>>>> each > >>>>>>> > >>>>>>>> analysis engine will make some atomic work and using ruta rules > you > >>>>>>>> can > >>>>>>>> easily work over generated annotations combine them, remove > them... > >>>>>>>> > >>>>>>> What I > >>>>>>> > >>>>>>>> need is to execute several analysis engines in parallel to > improve the > >>>>>>>> response time, so now the analysis engines are executed > sequentially > >>>>>>>> > >>>>>>> and > >>>>>> I > >>>>>>>> want to execute them in parallel, then take the output of all of > them > >>>>>>>> > >>>>>>> and > >>>>>> apply some ruta rules to the output. > >>>>>>>> would it be possible. > >>>>>>>> > >>>>>>>> On 17 December 2014 at 18:13, Peter Klügl < > [email protected]> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I haven't used UIMA-AS (with ruta) in a real application yet, > but I > >>>>>>>>> tested it once for an rc. Did you face any problems? > >>>>>>>>> > >>>>>>>>> Best > >>>>>>>>> > >>>>>>>>> Peter > >>>>>>>>> > >>>>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada: > >>>>>>>>> > >>>>>>>>>> Hi All, > >>>>>>>>>> > >>>>>>>>>> Is there any way to execute ruta scripts in parallel, using > uima-AS > >>>>>>>>>> aproach? in case yes could you provide me an example. > >>>>>>>>>> > >>>>>>>>>> Kind regards. > >>>>>>>>>> > >>>>>>>>>> > >
