Good day, Peter, We are learning UIMA Ruta and are having some problems with it. As I posted on stackoverflow, we have a lot of data in our documents that does not fit the traditional natural language mold. We have a lot of alphanumeric data such as file hashes, email addresses, domains, etc. We tried to re-work the JFlex lexer and re-build ruta-core, but are now struggling to get it working in the Ruta Workbench. Is there a better way to parse out and annotate such data? A file containing sentences or tabular data with MD5 hashes would be a great example.
Thank you, Fran Sent from my iPhone
