Re: Releasing a Language Detection Model

2017-07-11 Thread William Colen
+1 2017-07-11 10:30 GMT-03:00 Joern Kottmann : > Hello, > > right, very good point, I also think that it is very important to load > a model in one from the classpath. > > I propose we have the following setup: > - One jar contains one or multiple model packages (thats the zip container) > - A m

Re: Releasing a Language Detection Model

2017-07-11 Thread Chris Mattmann
Sounds good to me… On 7/11/17, 9:30 AM, "Joern Kottmann" wrote: Hello, right, very good point, I also think that it is very important to load a model in one from the classpath. I propose we have the following setup: - One jar contains one or multiple model package

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
1) This already included today by default in the model, it is possible to also place more data in it e.g. a file which contains eval results, a LICENSE and NOTICE file, etc 2) I would take a "best effort" approach and only publish one model per task and data set, if there are not really good reaso

Re: Releasing a Language Detection Model

2017-07-11 Thread Suneel Marthi
...one last point before wrapping up this discussion. Is it possible to that u could have more than one lang detect model but trained with different algorithms - like say 'MaxEnt', 'Naive Bayes', ' Perceptron' Questions: 1. Do we just publish one model trained on a specific algorithm, if so th

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
Hello, right, very good point, I also think that it is very important to load a model in one from the classpath. I propose we have the following setup: - One jar contains one or multiple model packages (thats the zip container) - A model name itself should be kind of unique e.g. eng-ud-token.bin

Re: Releasing a Language Detection Model

2017-07-11 Thread Aliaksandr Autayeu
To clarify on models and jars. Putting model inside jar might not be a good idea. I mean here things like bla-bla.jar/en-sent.bin. Our models are already zipped, so they are "jars" already in a sense. We're good. However, current packaging and metadata might not be very classpath friendly. The us

Re: Releasing a Language Detection Model

2017-07-11 Thread Chris Mattmann
Hi, FWIW, I’ve seen CLI tools – lots in my day – that can load from the CLI to override an internal classpath dependency. This is for people in environments who want a sensible / delivered internal classpath default and the ability for run-time, non zipped up/messing with JAR file override. Thi

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
I would not change the CLI to load models from jar files. I never used or saw a command line tool that expects a file as an input and would then also load it from inside a jar file. It will be hard to communicate how that works precisely in the CLI usage texts and this is not a feature anyone would

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
I am also not for default models. We are a library and people use it inside other software products, that is the place where meaningful defaults can be defined. Maybe our lang model works very well, you take that, hard code it and forget for the next couple of years about it, or it doesn't work and

Re: Releasing a Language Detection Model

2017-07-10 Thread William Colen
Regarding lang detect, we will release one model with +100 languages. Anyone will be able to reproduce the training or improve according to their needs. For example, one can reduce the corpus to work only with Latin languages if that is their need and maybe it can work better in some applications.

Re: Releasing a Language Detection Model

2017-07-10 Thread druss
+1 for releasing models as for the rest not sure how I feel. Is there just one model for the Language Detector? I don’t want this to become a versioning issue langDect.bin version 1 goes with 1.8.1, but 2 goes with 1.8.2. Can anyone download the Leipzig corpus? Being able to reproduce the mod

Re: Releasing a Language Detection Model

2017-07-10 Thread Aliaksandr Autayeu
Great idea! +1 for releasing models. +1 to publish models in jars on Maven Central. This is the fastest way to have somebody started. Moreover, having an extensible mechanism for others to do it on their own is really helpful. I did this with extJWNL for packaging WordNet data files. It is also c

Re: Releasing a Language Detection Model

2017-07-10 Thread Jeff Zemerick
+1 to an opennlp-models jar on Maven Central that contains the models. +1 to having the models available for download separately (if easily possible) for users who know what they want. +1 to having the training data shared somewhere with scripts to generate the models. It will help protect against

Re: Releasing a Language Detection Model

2017-07-10 Thread Chris Mattmann
+1. In terms of releasing models, maybe an opennlp-models package, and then using Maven structure of src/main/resources//*.bin for putting the models. Then using an assembly descriptor to compile the above into a *-bin.jar? Cheers, Chris On 7/10/17, 4:09 PM, "Joern Kottmann" wrote: M

Re: Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
My opinion about this is that we should offer the model as maven dependency for users who just want to use it in their projects, and also offer models for download for people to quickly try out OpenNLP. If the models can be downloaded, a new users could very quickly test it via the command line. I

Re: Releasing a Language Detection Model

2017-07-10 Thread William Colen
We need to address things such as sharing the evaluation results and how to reproduce the training. There are several possibilities for that, but there are points to consider: Will we store the model itself in a SCM repository or only the code that can build it? Will we deploy the models to a Mav