[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067076#comment-17067076 ] Michael Stenger commented on UIMA-6194: --- OK, understood. Thanks for explaining. > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067047#comment-17067047 ] Peter Klügl commented on UIMA-6194: --- Yes and no. Not only complete tokens, but only at offsets of any annotation, which chould be smaller than a token. It could even be a character. I added some more test to check that the literal string match is restricted to RutaBasic: {noformat} LiteralStringMatchTest.testInRutaBasicMatch() {noformat} > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066893#comment-17066893 ] Michael Stenger commented on UIMA-6194: --- I used the test for RutaLiteralMatcher in the repository to test how it responds to the examples I mentioned above. I therefore didn't use other analysis engines or modified the tokens in the test document. If you modify the test alike, you should see what I mean. What I get from the second comment is that RuleElementLiteral is supposed to work on complete tokens only, not snippets like "est" instead of "test". Is that what you mean? > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066542#comment-17066542 ] Peter Klügl commented on UIMA-6194: --- Ok, after having some sleep and some more thoughts about it: What I wrote is intended behavior, but it is not the current implementation. The changes for the literal string matcher in the last release have been implemented too fast with the believe that the test coverage ist good enough. That's not true, I will add more tests and fix the behavior. > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066167#comment-17066167 ] Peter Klügl commented on UIMA-6194: --- What do you mean exactly by that you tried and that they pass the Matcher? With a script or with a direct call to the matcher? The basic annotations are RutaBasic which are automatically created and managed to represent a complete disjunct partitioning. So you can modify the matching behavior of the literal string matches or also the dictionary lookup by adding your own annotations, e.g., decompounding. If you have prepended other analysis engines or if you used some simple regex rules or if you modified offsets manually, there could be RutaBasics smaller than TokenSeeds. (This may sound strange but I think it's a cool feature) > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065914#comment-17065914 ] Michael Stenger commented on UIMA-6194: --- I got another question on this subject: The matching behavior of RutaLiteralMatcher confuses me a bit. The commentary in the class, method getAnnotation, indicates that only strings ranging from the start of a basic annotation to the end of a basic annotation are considered for matching. In the respective test, class RutaLiteralMatcherTest, strings "test", "is a test", "." and so on should be matched, but not "est" or "s a tes". Still, if I try "est" or "Th", they do pass the Matcher. Is that purposeful behavior? Thanks. > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (UIMA-6194) Ruta: RutaLiteralMatcher throws exception for special choice of string
[ https://issues.apache.org/jira/browse/UIMA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063367#comment-17063367 ] Peter Klügl commented on UIMA-6194: --- Hi, sorry for the delayed response. This should already been fixed in the current trunk/snapshot. I will prepare a bugfix release asap. > Ruta: RutaLiteralMatcher throws exception for special choice of string > -- > > Key: UIMA-6194 > URL: https://issues.apache.org/jira/browse/UIMA-6194 > Project: UIMA > Issue Type: Bug > Components: Ruta >Affects Versions: 2.8.0ruta >Reporter: Michael Stenger >Assignee: Peter Klügl >Priority: Minor > Fix For: 2.8.1ruta, 3.0.1ruta > > > For certain combinations of document text and RuleElementLiteral in the > script, method getAnnotation of class RutaLiteralMatcher throws a > NullPointerException. This seems to be the case whenever the used string is > a postfix or infix of a word in the document, but itself doesn't occur. > h4. Example > Script > > {code:java} > DECLARE testType; > "est" {-> testType}; > "est te"{-> testType}; > {code} > Document > > {code:java} > test test{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)