There are many brown clusters here http://www.derczynski.com/sheffield/brown-tuning/
Also the Brown bllip clusters are available http://people.csail.mit.edu/maestro/papers/bllip-clusters.gz And here if you unzip the models you can find clusters (Brown, Clark and Word2vec) inside for several languages: http://ixa2.si.ehu.es/ixa-pipes/models/nerc-models-1.5.4.tgz The clusters description is here (go to table 4): https://doi.org/10.1016/j.artint.2016.05.003 Furthermore, you can find here clusters induced on Yelp data (reviews). Just unzip the models: http://ixa2.si.ehu.es/ixa-pipes/models/ote-models-1.5.0.tgz HTH, R On Tue, Jul 18, 2017 at 2:35 PM, William Colen <william.co...@gmail.com> wrote: > Sheng, > > Regarding 2, take a look at this like, it can help you: > https://github.com/ragerri/cluster-preprocessing > > Regarding 1, you are right. If you trained with a custom feature generator > it will be applied both in training and runtime. > > William > > 2017-07-14 16:59 GMT-03:00 Sheng <sheng...@gmail.com>: > >> Hi, >> >> I am new to opennlp, and currently is trying to learn how to train a ner >> model. I have 2 questions, >> >> 1. In case I am using a custom set of features for training, do I have to >> feed that set of features to NameFinderMe when I load the trained model. I >> think not, as the xml descriptor has been part of artifactMap which is >> persisted, but I may be wrong. >> >> 2. In the documentation on your web, you give an example of xml desc file >> for training a ner model, which includes a few "cluster" based features. >> These features need dictionary objects as part of the instantiation from >> the resources. Apart from BrownCluster which is mentioned in the javadoc >> that one should download a document from >> metaoptimize.com/projects/wordreprs/. Do I just need to load that file >> into >> BrownCluster directly? That link is unreachable at the moment, is it >> already dead forever? And how about the other clusters? How can one create >> a word2vec cluster, and what is clark.cluster ?? >> >> This is a long question. I really appreciate your patience of reading and >> responding it! >>