Hello Mr. Baker, Do you have any more supporting information on Baleen? Perhaps a website? I don’t see it referenced on Github?
Thanks, b ~~~~~ May All Your Sequences Converge > On Dec 15, 2015, at 3:40 AM, Baker James D <[email protected]> wrote: > > Classification: UK OFFICIAL > Morning all, > > A new version of Baleen, the UIMA based entity extraction and text analytics > framework developed by Dstl (part of the UK Ministry of Defence) has been > released. This version includes the following improvements: > > > * New Annotator: MongoStemming uses a gazetteer and stemming to > perform a pseudo-fuzzy match and find gazetter terms in different tenses and > plurals > > * New Cleaner: MergeAdjacent will merge adjacent entities of the same > type > > * New Content Extractor: CsvContentExtractor splits CSV fields into > content and metadata > > * New Collection Reader: LineReader will read a single file into > multiple documents by line > > * New REST API to get configuration parameters for components (e.g. > annotators) > > * Significant changes to the way gazetteer annotators work, including > changing from RadixTrees to MultiMaps and implementing the Aho-Corasick > algorithm, resulting in performance improvements for large gazetteers in the > order of 100s > > * Lots of bug fixes and minor improvements > > The latest release is available on GitHub: https://github.com/dstl/baleen > > Any feedback, suggestions, comments, issues and code contributions are > welcome! We're keen for people to help us improve it so that it's a useful > tool for a wide range of people. > > James > > "This e-mail and any attachment(s) is intended for the recipient only. Its > unauthorised use, > disclosure, storage or copying is not permitted. Communications with Dstl > are monitored and/or > recorded for system efficiency and other lawful purposes, including business > intelligence, business > metrics and training. Any views or opinions expressed in this e-mail do not > necessarily reflect Dstl policy." > > "If you are not the intended recipient, please remove it from your system and > notify the author of > the email and [email protected]"
