On Thu, Jun 29, 2017 at 8:36 PM, Ling <lingv...@gmail.com> wrote: > Hi, Suneel , that's great. The reason was that I wanted to do something in > DeepLearnig4j and happened to find that openNLP was integrated into it > already. So I just used their API to call openNLP. > > Is there a set date for next release? Also, are the 1.5 models the same as > the models to be included in the 1.81 release? >
shuld be some time next week. if u r talking about the usage by 'models being the same', yes nothing changes in how u invoke the model from ur code. > > Thanks. > Ling > > On Thu, Jun 29, 2017 at 5:30 PM, Suneel Marthi <smar...@apache.org> wrote: > > > On Thu, Jun 29, 2017 at 8:07 PM, Ling <lingv...@gmail.com> wrote: > > > > > Hi, Jörn: > > > > > > I want to directly use openNLP, instead of deeplearning4j and UIMA. I > > > included the Maven 1.8 version in my POM file, then do I still need to > > > download the models separately? And I can't find those model files. For > > > example, to do a simple test on tokenization model, > > > > > > > Dl4j is for Deep learning, OpenNLP is for text processing - not sure why > > you would go to DL4J first and revert back to OpenNLP if all u want to do > > is basic text processing. > > > > The model files (1.5 models) are presently at - > > http://opennlp.sourceforge.net/models-1.5/ > > > > > > > > > > > > InputStream is = new FileInputStream("en-token.bin"); > > > > > > Do I have to download the en-token.bin separately? I am working in a > > maven > > > projects. Thank you > > > > > > Yes, the models need to be downloaded separately. > > > > We finally got approval from Apache Foundation to distribute OpenNLP > models > > thru Apache, following the upcoming 1.8.1 release we should be > distributing > > updated 1.8.1 models too once we hash out the details for doing that. > > > > > > > . > > > > > > Ling > > > > > > > > > On Thu, Jun 29, 2017 at 10:42 AM, Joern Kottmann <kottm...@gmail.com> > > > wrote: > > > > > > > Long chain, yes, then you probably use the SourceForge tokenization > > > > model that was trained on some old news. > > > > > > > > We usually don't consider mistakes the models do as bugs because we > > > > can't do much about it other than suggesting to use models that fit > > > > your data very well and even in that case models can be wrong > > > > sometimes. > > > > > > > > If there is something we can do here to reduce the error rate then we > > > > are very happy to get that as a contribution or just pointed out. > > > > > > > > Jörn > > > > > > > > On Thu, Jun 29, 2017 at 6:54 PM, Ling <lingv...@gmail.com> wrote: > > > > > Hi, Jörn: > > > > > > > > > > I am using a Deeplearning4j, which uses org.apache.uima library I > > > think. > > > > > And then UIMA uses openNLP. Probably that's what happens. > > > > > > > > > > So it isn't openNLP's original problem? Thank you. > > > > > > > > > > Ling > > > > > > > > > > On Thu, Jun 29, 2017 at 12:30 AM, Joern Kottmann < > kottm...@gmail.com > > > > > > > wrote: > > > > > > > > > >> Hello, > > > > >> > > > > >> which model are you using? Did you train it yourself? > > > > >> > > > > >> Jörn > > > > >> > > > > >> On Thu, Jun 29, 2017 at 4:04 AM, Ling <lingv...@gmail.com> wrote: > > > > >> > Hi, all: > > > > >> > > > > > >> > I am testing openNLP and found some significant tokenization > issue > > > > >> > involving punctuation. > > > > >> > > > > > >> > Thank you Costco! > > > > >> > i love costco! > > > > >> > I love Costco!! > > > > >> > FUCK IKEA. > > > > >> > > > > > >> > In all these cases, the last punctuation is not split so > "Costco!" > > > and > > > > >> > "IKEA." are treated as one token. This looks like a systematic > > > > problem. > > > > >> > Before I file an issue on OpenNLP project, I want to make sure > > this > > > > issue > > > > >> > is true coming from the library. > > > > >> > > > > > >> > Does any of you encounter similar problem? Thanks. > > > > >> > > > > > > > > > >