[GitHub] opennlp pull request #51: OPENNLP-923: Wrap all lines longer than 110 chars

2017-01-11 Thread kottmann
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/51 OPENNLP-923: Wrap all lines longer than 110 chars You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/opennlp OPENNLP-923-2

[GitHub] opennlp pull request #50: OPENNLP-930: [WIP Don't Merge] Write test for Rege...

2017-01-11 Thread smarthi
GitHub user smarthi opened a pull request: https://github.com/apache/opennlp/pull/50 OPENNLP-930: [WIP Don't Merge] Write test for RegexNameFinderFactory You can merge this pull request into a Git repository by running: $ git pull https://github.com/smarthi/opennlp

[GitHub] opennlp pull request #49: OPENNLP-923: Wrap all lines longer than 110 chars

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/49 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] opennlp pull request #48: OPENNLP-719: Override any name type with specified...

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/48 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] opennlp pull request #49: OPENNLP-923: Wrap all lines longer than 110 chars

2017-01-11 Thread kottmann
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/49 OPENNLP-923: Wrap all lines longer than 110 chars And also add checkstyle enforcement You can merge this pull request into a Git repository by running: $ git pull

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
On Wed, 2017-01-11 at 17:14 +, Russ, Daniel (NIH/CIT) [E] wrote: > Hi, > >    I am little confused. Why do you want to share an instance of a > SentenceDetectorME across threads? Are you documents very long single > sentences? I don’t think there is enough work for the > SentenceDetectorME to

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
+1 ease of use is important for us and has always been a strong focus here. Jörn On Wed, 2017-01-11 at 17:39 +0100, Thilo Goetz wrote: > You can do all sorts of things. I implemented a version now that > uses  > ThreadLocals. Works fine, but quite frankly, it's a pain in the > butt.  > The world

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote: > in a recent project, I was using SentenceDetectorME, TokenizerME and  > POSTaggerME. It turns out that none of those is thread safe. This is  > because the classification probabilities for the last tag() call > (for  > example) are stored in

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Russ, Daniel (NIH/CIT) [E]
Hi, I am little confused. Why do you want to share an instance of a SentenceDetectorME across threads? Are you documents very long single sentences? I don’t think there is enough work for the SentenceDetectorME to make up the cost of multithreading on 4 cores. Previously, I had

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Cohan Sujay Carlos
Control over threading is not required to "share the model between threads and create one instance of the component per thread". One could use a scope where variable references are guaranteed to be stored in the call stack (say method-local variables in Java). You could then: a) Instantiate

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
+1 to make SentenceDectorME and TokenizerME thread safe and everything else where it works out for us. Making it thread safe only makes sense if you can get the throughput almost multiplied by using more cores. This works with the current model. For the POSTagger we would have to change the API

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Thilo Goetz
Correct me if I'm wrong, but that approach only works if you control the thread creation yourself. In my case, for example, I was using Scala's parallel collection API, and had no control over the threading. I will usually want to create one service that does tokenization or POS tagging or

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
Hello Thilo, I am interested in your opinion about how this is done currently. We say: "Share the model between threads and create one instance of the component per thread". Wouldn't that work well in your use case? Jörn On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz wrote:

[GitHub] opennlp pull request #44: OPENNLP-137 - Training cmd line tools should measu...

2017-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/44 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

Thread-safe versions of some of the tools

2017-01-11 Thread Thilo Goetz
Hi, in a recent project, I was using SentenceDetectorME, TokenizerME and POSTaggerME. It turns out that none of those is thread safe. This is because the classification probabilities for the last tag() call (for example) are stored in a member variable and can be retrieved by a separate API

[GitHub] opennlp pull request #47: OPENNLP-932: Use checkstyle suppression instead of...

2017-01-11 Thread kottmann
GitHub user kottmann opened a pull request: https://github.com/apache/opennlp/pull/47 OPENNLP-932: Use checkstyle suppression instead of mvn exclude You can merge this pull request into a Git repository by running: $ git pull https://github.com/kottmann/opennlp OPENNLP-932