You need to fetch the manifest from the artifact map and then
put the chars into the manifest itself.

Please see TokenizerModel.useAlphaNumericOptimization on how
to do that.

Jörn

On Thu, Feb 9, 2012 at 2:20 PM, Katrin Tomanek
<[email protected]>wrote:

> Hi Jörn,
>
> I did that:
>
>
>  public SentenceModel(String languageCode, AbstractModel sentModel,
>      boolean useTokenEnd, Dictionary abbreviations, char[] eosCharacters,
> Map<String, String> manifestInfoEntries) {
>
>    super(COMPONENT_NAME, languageCode, manifestInfoEntries);
>
>    artifactMap.put(MAXENT_MODEL_**ENTRY_NAME, sentModel);
>
>    setManifestProperty(TOKEN_END_**PROPERTY,
> Boolean.toString(useTokenEnd))**;
>
>    // Abbreviations are optional
>    if (abbreviations != null)
>        artifactMap.put(ABBREVIATIONS_**ENTRY_NAME, abbreviations);
>
>    // EOS characters are optional
>    if (eosCharacters!=null)
>        artifactMap.put(EOS_**CHARACTERS_ENTRY_NAME, eosCharArrayToString(*
> *eosCharacters));
>
>    checkArtifactMap();
>  }
>
> the EOS-Char-Array is transformed to a string which is written to the
> manifest.
>
> Still, wenn serializing the model, I get:
>
>
> Exception in thread "main" java.lang.**IllegalStateException: Missing
> serializer for eosCharacters
>
>
> Best,
> Katrin
>
>
> On 02/09/2012 12:48 PM, Joern Kottmann wrote:
>
>> The artifactMap map contains a manifest (that is a Properties object).
>> You should store the EOS chars in this manifest. We need a smart way to
>> convert
>> them into a String.
>>
>> The Sentence Detector should retrieve the EOS chars then from the model
>> e.g. make a method getEosChars.
>>
>> Have a look at the other model classes as well, e.g. the tokenizer model.
>> It stores some settings in the manifest. That is a good place to look for
>> a
>> code sample.
>>
>> Jörn
>>
>>
>> On Thu, Feb 9, 2012 at 12:38 PM, Katrin Tomanek
>> <[email protected]>**wrote:
>>
>>  Hi,
>>>
>>> I am moving the discussion on making the EOS characters of the sentence
>>> splitter configurable to the dev list (it was previously on the user
>>> list).
>>>
>>> I am currently trying to make the EOS characters a parameter of the
>>> SentenceDetectorME and store it as model parameter.
>>>
>>> Thus far, this works fine (although it requires quite some positions in
>>> the code to change).
>>>
>>> I am putting a "char[] eosCharacters" to the artifactMap in
>>> SentenceModel.
>>> When predicting with a model, I test whether the eos parameter is set and
>>> if so I use these eos symbols, otherwise the language dependent ones.
>>>
>>> Anyways, I am now getting into troubles when serializing the model with
>>> the new "char[]" parameter:
>>>
>>> Writing sentence detector model ... Exception in thread "main"
>>> java.lang.*
>>>
>>> *IllegalStateException: Missing serializer for eosCharacters
>>>
>>> I know that I would have to write such a serializer, however, I am a bit
>>> lost here. Any hints (maybe there is already a serializer for char[]
>>> which
>>> I could easily use).
>>>
>>> Best
>>> Katrin
>>>
>>>
>>
>
> --
> Dr. Katrin Tomanek
> Averbis GmbH
> Tennenbacher Strasse 11
> D-79106 Freiburg
>
> Fon: +49 (0) 761 - 203 97696
> Fax: +49 (0) 761 - 203 97694
> E-Mail: [email protected]
>
> Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
> Sitz der Gesellschaft: Freiburg i. Br.
> AG Freiburg i. Br., HRB 701080
>

Reply via email to