[jira] [Commented] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

2018-10-15 Thread Joern Kottmann (JIRA)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649846#comment-16649846
 ] 

Joern Kottmann commented on OPENNLP-1214:
-

I was +1 for this change because it looks a bit nicer than having the loop in 
the code here. Especially for a very small set size this is slower than just 
performing a linear scan through the array.

I doubt this change has any impact on the run-time when it is measured 
performing sentence splitting.

> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --
>
> Key: OPENNLP-1214
> URL: https://issues.apache.org/jira/browse/OPENNLP-1214
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.1
>
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

2018-10-15 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened OPENNLP-1214:
-

> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --
>
> Key: OPENNLP-1214
> URL: https://issues.apache.org/jira/browse/OPENNLP-1214
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.1
>
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

2018-10-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649820#comment-16649820
 ] 

ASF GitHub Bot commented on OPENNLP-1214:
-

kojisekig opened a new pull request #336: Revert "OPENNLP-1214: use hash to 
avoid linear search in DefaultEndOfSentenceā€¦"
URL: https://github.com/apache/opennlp/pull/336
 
 
   Reverts apache/opennlp#329


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --
>
> Key: OPENNLP-1214
> URL: https://issues.apache.org/jira/browse/OPENNLP-1214
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.1
>
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

2018-10-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649815#comment-16649815
 ] 

ASF GitHub Bot commented on OPENNLP-1214:
-

kojisekig commented on issue #329: OPENNLP-1214: use hash to avoid linear 
search in DefaultEndOfSentenceā€¦
URL: https://github.com/apache/opennlp/pull/329#issuecomment-429729671
 
 
   Sorry for the late reply as I've been busy for my project.
   
   Actually, when I merged this, I thought I doubted how this change improved 
the performance.
   
   Let me withdraw this once as I don't have any particular reason to stick 
this change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --
>
> Key: OPENNLP-1214
> URL: https://issues.apache.org/jira/browse/OPENNLP-1214
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.1
>
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)