Hello Mr. Baker,

Do you have any more supporting information on Baleen?  Perhaps a website? I 
don’t see it referenced on Github?

Thanks,
b

~~~~~
May All Your Sequences Converge

> On Dec 15, 2015, at 3:40 AM, Baker James D <[email protected]> wrote:
> 
> Classification: UK OFFICIAL
> Morning all,
> 
> A new version of Baleen, the UIMA based entity extraction and text analytics 
> framework developed by Dstl (part of the UK Ministry of Defence) has been 
> released. This version includes the following improvements:
> 
> 
> *         New Annotator: MongoStemming uses a gazetteer and stemming to 
> perform a pseudo-fuzzy match and find gazetter terms in different tenses and 
> plurals
> 
> *         New Cleaner: MergeAdjacent will merge adjacent entities of the same 
> type
> 
> *         New Content Extractor: CsvContentExtractor splits CSV fields into 
> content and metadata
> 
> *         New Collection Reader: LineReader will read a single file into 
> multiple documents by line
> 
> *         New REST API to get configuration parameters for components (e.g. 
> annotators)
> 
> *         Significant changes to the way gazetteer annotators work, including 
> changing from RadixTrees to MultiMaps and implementing the Aho-Corasick 
> algorithm, resulting in performance improvements for large gazetteers in the 
> order of 100s
> 
> *         Lots of bug fixes and minor improvements
> 
> The latest release is available on GitHub: https://github.com/dstl/baleen
> 
> Any feedback, suggestions, comments, issues and code contributions are 
> welcome! We're keen for people to help us improve it so that it's a useful 
> tool for a wide range of people.
> 
> James
> 
> "This e-mail and any attachment(s) is intended for the recipient only.   Its 
> unauthorised use, 
> disclosure, storage or copying is not permitted.  Communications with Dstl 
> are monitored and/or 
> recorded for system efficiency and other lawful purposes, including business 
> intelligence, business 
> metrics and training.  Any views or opinions expressed in this e-mail do not 
> necessarily reflect Dstl policy."
> 
> "If you are not the intended recipient, please remove it from your system and 
> notify the author of 
> the email and [email protected]"

Reply via email to