[jira] [Commented] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner
[ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649846#comment-16649846 ] Joern Kottmann commented on OPENNLP-1214: - I was +1 for this change because it looks a bit nicer than having the loop in the code here. Especially for a very small set size this is slower than just performing a linear scan through the array. I doubt this change has any impact on the run-time when it is measured performing sentence splitting. > use hash to avoid linear search in DefaultEndOfSentenceScanner > -- > > Key: OPENNLP-1214 > URL: https://issues.apache.org/jira/browse/OPENNLP-1214 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.1 > > > When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to > check if each characters in the sentence is one of eos characters. I think > we'd better use HashSet to keep eosCharacters instead of char[]. > In accordance with this replacement, I'd like to make > getEndOfSentenceCharacters() deprecated because it returns char[] and nobody > in OpenNLP calls it at present, and I'd like to add the equivalent method > which returns Set of eos chars. Though it cannot keep the order of > eos chars but I don't think it can be a problem anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner
[ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reopened OPENNLP-1214: - > use hash to avoid linear search in DefaultEndOfSentenceScanner > -- > > Key: OPENNLP-1214 > URL: https://issues.apache.org/jira/browse/OPENNLP-1214 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.1 > > > When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to > check if each characters in the sentence is one of eos characters. I think > we'd better use HashSet to keep eosCharacters instead of char[]. > In accordance with this replacement, I'd like to make > getEndOfSentenceCharacters() deprecated because it returns char[] and nobody > in OpenNLP calls it at present, and I'd like to add the equivalent method > which returns Set of eos chars. Though it cannot keep the order of > eos chars but I don't think it can be a problem anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner
[ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649820#comment-16649820 ] ASF GitHub Bot commented on OPENNLP-1214: - kojisekig opened a new pull request #336: Revert "OPENNLP-1214: use hash to avoid linear search in DefaultEndOfSentenceā¦" URL: https://github.com/apache/opennlp/pull/336 Reverts apache/opennlp#329 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > use hash to avoid linear search in DefaultEndOfSentenceScanner > -- > > Key: OPENNLP-1214 > URL: https://issues.apache.org/jira/browse/OPENNLP-1214 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.1 > > > When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to > check if each characters in the sentence is one of eos characters. I think > we'd better use HashSet to keep eosCharacters instead of char[]. > In accordance with this replacement, I'd like to make > getEndOfSentenceCharacters() deprecated because it returns char[] and nobody > in OpenNLP calls it at present, and I'd like to add the equivalent method > which returns Set of eos chars. Though it cannot keep the order of > eos chars but I don't think it can be a problem anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner
[ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649815#comment-16649815 ] ASF GitHub Bot commented on OPENNLP-1214: - kojisekig commented on issue #329: OPENNLP-1214: use hash to avoid linear search in DefaultEndOfSentenceā¦ URL: https://github.com/apache/opennlp/pull/329#issuecomment-429729671 Sorry for the late reply as I've been busy for my project. Actually, when I merged this, I thought I doubted how this change improved the performance. Let me withdraw this once as I don't have any particular reason to stick this change. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > use hash to avoid linear search in DefaultEndOfSentenceScanner > -- > > Key: OPENNLP-1214 > URL: https://issues.apache.org/jira/browse/OPENNLP-1214 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.1 > > > When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to > check if each characters in the sentence is one of eos characters. I think > we'd better use HashSet to keep eosCharacters instead of char[]. > In accordance with this replacement, I'd like to make > getEndOfSentenceCharacters() deprecated because it returns char[] and nobody > in OpenNLP calls it at present, and I'd like to add the equivalent method > which returns Set of eos chars. Though it cannot keep the order of > eos chars but I don't think it can be a problem anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005)