Finally, the format described here [1] was right.
The classes which are in charge of parsing the dictionary file and to
create it are [2], [3] and [4].
I managed to make the trainer to accept my dictionary.

My problem was due to the fact that an entry cannot be made of several tokens...

Best

[1] http://www.mail-archive.com/opennlp-dev@incubator.apache.org/msg01352.html
[2] 
http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/postag/POSTaggerTrainerTool.java?view=markup
[3] 
http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/postag/POSDictionary.java?view=markup
[4] 
http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/dictionary/serializer/DictionarySerializer.java?view=markup




On Wed, Oct 19, 2011 at 6:12 PM, Nicolas Hernandez
<nicolas.hernan...@gmail.com> wrote:
> Hi Everyone,
>
> please, can someone indicate me what is the current postagger
> dictionary format ?
> I would like to use the POSTaggerTrainer in command line.
>
> First I attempt to generate it based on
> http://www.mail-archive.com/opennlp-dev@incubator.apache.org/msg01352.html
> Second (since the previous attempt failed) I looked at the code
> http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerTrainer.java?view=markup
> But I still have errors...
> [Fatal Error] :1:1: Content is not allowed in prolog.
> IO error while reading training data or indexing data: The profile
> data stream has an invalid format!
>
> Thank you for your help
>
> /Nicolas
>



-- 
nicolas.hernan...@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire Informatique de Nantes Atlantique CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

Reply via email to