Regarding lang detect, we will release one model with +100 languages.
Anyone will be able to reproduce the training or improve according to their
needs. For example, one can reduce the corpus to work only with Latin
languages if that is their need and maybe it can work better in some
applications.
+1 for releasing models
as for the rest not sure how I feel. Is there just one model for the Language
Detector? I don’t want this to become a versioning issue langDect.bin version 1
goes with 1.8.1, but 2 goes with 1.8.2. Can anyone download the Leipzig
corpus? Being able to reproduce the mod
Great idea!
+1 for releasing models.
+1 to publish models in jars on Maven Central. This is the fastest way to
have somebody started. Moreover, having an extensible mechanism for others
to do it on their own is really helpful. I did this with extJWNL for
packaging WordNet data files. It is also c
+1 to an opennlp-models jar on Maven Central that contains the models.
+1 to having the models available for download separately (if easily
possible) for users who know what they want.
+1 to having the training data shared somewhere with scripts to generate
the models. It will help protect against
+1. In terms of releasing models, maybe an opennlp-models package, and then
using Maven structure of src/main/resources//*.bin for
putting the models.
Then using an assembly descriptor to compile the above into a *-bin.jar?
Cheers,
Chris
On 7/10/17, 4:09 PM, "Joern Kottmann" wrote:
M
My opinion about this is that we should offer the model as maven
dependency for users who just want to use it in their projects, and
also offer models for download for people to quickly try out OpenNLP.
If the models can be downloaded, a new users could very quickly test
it via the command line.
I
We need to address things such as sharing the evaluation results and how to
reproduce the training.
There are several possibilities for that, but there are points to consider:
Will we store the model itself in a SCM repository or only the code that
can build it?
Will we deploy the models to a Mav
Hello all,
since Apache OpenNLP 1.8.1 we have a new language detection component
which like all our components has to be trained. I think we should
release a pre-build model for it trained on the Leipzig corpus. This
will allow the majority of our users to get started very quickly with
language de