Finally, the format described here [1] was right. The classes which are in charge of parsing the dictionary file and to create it are [2], [3] and [4]. I managed to make the trainer to accept my dictionary.
My problem was due to the fact that an entry cannot be made of several tokens... Best [1] http://www.mail-archive.com/opennlp-dev@incubator.apache.org/msg01352.html [2] http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/postag/POSTaggerTrainerTool.java?view=markup [3] http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/postag/POSDictionary.java?view=markup [4] http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/dictionary/serializer/DictionarySerializer.java?view=markup On Wed, Oct 19, 2011 at 6:12 PM, Nicolas Hernandez <nicolas.hernan...@gmail.com> wrote: > Hi Everyone, > > please, can someone indicate me what is the current postagger > dictionary format ? > I would like to use the POSTaggerTrainer in command line. > > First I attempt to generate it based on > http://www.mail-archive.com/opennlp-dev@incubator.apache.org/msg01352.html > Second (since the previous attempt failed) I looked at the code > http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerTrainer.java?view=markup > But I still have errors... > [Fatal Error] :1:1: Content is not allowed in prolog. > IO error while reading training data or indexing data: The profile > data stream has an invalid format! > > Thank you for your help > > /Nicolas > -- nicolas.hernan...@univ-nantes.fr # http://enicolashernandez.blogspot.com http://www.univ-nantes.fr/hernandez-n # Laboratoire Informatique de Nantes Atlantique CNRS UMR 6241 tel. +33 (0)2 51 12 58 55 # Université de Nantes - Institut Universitaire de Technologie - Département Informatique tel. +33 (0)2 40 30 60 67