Hi Peter I missed this email. I see your point about the analysis engines changing arbitrary the annotations, however that fact can occur now, if a script uses EXEC action to execute external analysis engine, I think that an extra parameter could be added to ruta to specify if ruta tokenization, RutaAnnotations and RutaStream can be reused. I think that it may be possible to reuse ruta tokenization (annotations stream) across same Cas.
Best Silvestre. On 31 December 2014 at 13:31, Peter Klügl <[email protected]> wrote: > Am 29.12.2014 um 16:24 schrieb Silvestre Losada: > >> Thanks for your answer, I was working in this way and seems to be best >> approach. The problem here is that I need to setup several RutaEngines in >> the pipe, it would be nice if RutaStream or at least ruta annotations >> generated can be reused from one RutaEngine to another RutaEngine in same >> pipe, to avoid duplicated information. If you wish I can implement it and >> submit a patch to you. >> > > Oh yes, this causes a real slowdown when applying several scripts within a > pipeline. All help is welcome :-) > > The main problem is that ruta requires additional indexing information for > conditions like PARTOF (which otherwise would be terribly slow). I don't > think that reusing the RutaStream would help because there could be an > arbitrary analysis engine changing arbitrary annotations. The RutaBasic > annotations are already reused to some extend, but the indexing is done > again. My first guess would be that we add another configuration parameter > with a list of all types that analysis engines applied after the last ruta > engine may have changed. Some helper methods could set these values > automatically given a pipeline. We could also use the capabilities of the > engines, but I am not sure that they are always correctly set. > > What do you think? > > Best, > > Peter > > > >> Kind regards. >> >> On 19 December 2014 at 17:54, Peter Klügl <[email protected]> >> wrote: >> >> Am 19.12.2014 15:10, schrieb Silvestre Losada: >>> >>>> Hi Jens, >>>> >>>> First of all thanks for your detailed answer. UIMA ruta has an option in >>>> order to execute an analisys engine from ruta script here >>>> <http://goo.gl/ekbhv8> is described. So inside the script you can >>>> >>> execute >>> >>>> the analysis engine and then apply some rules to the annotations created >>>> >>> by >>> >>>> the analysis engine. What I want is to have the option to execute the >>>> analysis engines in parallel to save time. Would it be possible? >>>> >>> That's not possible in that way that you use more or other processes for >>> the contained analysis engine than for the ruta script. The analysis >>> engine and the rules can be parallelized together as one analysis engine >>> namely that one of the script. >>> >>> You should probably extract the analysis engine into a pipeline, which >>> applies the analysis engine and then the script (resp. its analysis >>> engine). Then, the normal UIMA-AS setting applies. >>> >>> Best, >>> >>> Peter >>> >>> >>> Kind regards >>>> >>>> On 19 December 2014 at 12:35, Jens Grivolla <[email protected]> wrote: >>>> >>>>> Hi Silvestre, >>>>> >>>>> there doesn't seem to be anything RUTA-specific in your question. In >>>>> principle, UIMA-AS allows parallel scaleout and merges the results >>>>> >>>> (though >>> >>>> I personally have never used it this way), but there are of course a few >>>>> things to take into account. >>>>> >>>>> First, you will of course need to properly define the dependencies >>>>> >>>> between >>> >>>> your different analysis engines to ensure you always have all then >>>>> necessary information available, meaning that you can only run things >>>>> in >>>>> parallel that are independent of one another. And then you will have to >>>>> >>>> see >>> >>>> if the overhead from distributing your CAS to several engines running in >>>>> parallel and then merging the results is not greater than just having >>>>> >>>> it in >>> >>>> one colocated pipeline that can pass the information more efficiently. I >>>>> guess you'll have to benchmark your specific application, but maybe >>>>> somebody with more experience can give you some general directions... >>>>> >>>>> Best, >>>>> Jens >>>>> >>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada < >>>>> [email protected]> wrote: >>>>> >>>>>> Well let me explain. >>>>>> >>>>>> Ruta scripts are really good to work over output of analysis engines, >>>>>> >>>>> each >>>>> >>>>>> analysis engine will make some atomic work and using ruta rules you >>>>>> can >>>>>> easily work over generated annotations combine them, remove them... >>>>>> >>>>> What I >>>>> >>>>>> need is to execute several analysis engines in parallel to improve the >>>>>> response time, so now the analysis engines are executed sequentially >>>>>> >>>>> and >>> >>>> I >>>>> >>>>>> want to execute them in parallel, then take the output of all of them >>>>>> >>>>> and >>> >>>> apply some ruta rules to the output. >>>>>> >>>>>> would it be possible. >>>>>> >>>>>> On 17 December 2014 at 18:13, Peter Klügl <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I haven't used UIMA-AS (with ruta) in a real application yet, but I >>>>>>> tested it once for an rc. Did you face any problems? >>>>>>> >>>>>>> Best >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Is there any way to execute ruta scripts in parallel, using uima-AS >>>>>>>> aproach? in case yes could you provide me an example. >>>>>>>> >>>>>>>> Kind regards. >>>>>>>> >>>>>>>> >>> >
