John I have been looking at this project before. The way I use spark doesn't need this complexity: just spread the text in a RDD, and pass them into the pipeline; done.
Then 10 rows of scala code is sufficient in my case, and can be adapted depending on the source (database, csv, pdfs...). Moreover the github project is dead for 4 years... Le 15 sept. 2017 à 16:32, Osborne, John D écrivait : > Thanks Richard and Nicholas, > > Nicholas - have you looked at SUIM (https://github.com/oaqa/suim) ? > > It's also doing UIMA on Spark - I'm wondering if you are aware of it and how > it is different from your own project? > > Thanks for any info, > > -John > > > ________________________________________ > From: Richard Eckart de Castilho [[email protected]] > Sent: Friday, September 15, 2017 5:29 AM > To: [email protected] > Subject: Re: UIMA analysis from a database > > On 15.09.2017, at 09:28, Nicolas Paris <[email protected]> wrote: > > > > - UIMA-AS is another way to program UIMA > > Here you probably meant uimaFIT. > > > - UIMA-FIT is complicated > > - UIMA-FIT only work with UIMA > > ... and I suppose you mean UIMA-AS here. > > > - UIMA only focuses on text Annotation > > Yep. Although it has also been used for other media, e.g. video and audio. > But the core UIMA framework doesn't specifically consider these media. > People who apply it UIMA in the context of other media do so with custom > type systems. > > > - UIMA is not good at: > > - text transformation > > It is not straight-forward but possible. E.g. the text normalizers in > DKPro Core make use of either different views for different states of > normalization or drop the original text and forward the normalized > text within a pipeline by means of a CAS multiplier. > > > - read data from source in parallel > > - write data to folder in parallel > > Not sure if these two are limitations of the framework > rather than of the way that you use readers and writers > in the particular scale-out mode you are working with. > > > - machine learning interface > > UIMA doesn't offer ML as part of the core framework because > that is simply not within the scope of what the UIMA framework > aims to achieve. > > There are various people who have built ML around UIMA, e.g. > ClearTK > (https://urldefense.proofpoint.com/v2/url?u=http-3A__cleartk.github.io_cleartk_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t-De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D-L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=J1-BGfzlrX9t3-Vg5K7mAVBHQSb7M5PAbTYIJoh6sOM&e= > ) or DKPro TC > (https://urldefense.proofpoint.com/v2/url?u=https-3A__dkpro.github.io_dkpro-2Dtc_&d=DwICAw&c=o3PTkfaYAd6-No7SurnLtwPssd47t-De9Do23lQNz7U&r=SEpLmXf_P21h_X0qEQSssKMDDEOsGxxYoSxofi_ZbFo&m=tAU9eh1Sq_D-L1P4GfuME4SQleRf9q_7Ll9siim5W0c&s=kye5D2izwKE_9V2QQW8leiKp0p-91U-CFwXJMFmCd3w&e= > ) - and as you did, it > can be combined in various ways with ML frameworks that > specialize specifically on ML. > > > Cheers, > > -- Richard > >
