Hi Jörn,
I have made all changes and added a patch to the JIRA-issue.
What are the next steps ?
And btw: when do you plan to release 1.5.3 ?
Best
Katrin
On 02/09/2012 02:26 PM, Joern Kottmann wrote:
You need to fetch the manifest from the artifact map and then
put the chars into the manifest itself.
Please see TokenizerModel.useAlphaNumericOptimization on how
to do that.
Jörn
On Thu, Feb 9, 2012 at 2:20 PM, Katrin Tomanek
<[email protected]>wrote:
Hi Jörn,
I did that:
public SentenceModel(String languageCode, AbstractModel sentModel,
boolean useTokenEnd, Dictionary abbreviations, char[] eosCharacters,
Map<String, String> manifestInfoEntries) {
super(COMPONENT_NAME, languageCode, manifestInfoEntries);
artifactMap.put(MAXENT_MODEL_**ENTRY_NAME, sentModel);
setManifestProperty(TOKEN_END_**PROPERTY,
Boolean.toString(useTokenEnd))**;
// Abbreviations are optional
if (abbreviations != null)
artifactMap.put(ABBREVIATIONS_**ENTRY_NAME, abbreviations);
// EOS characters are optional
if (eosCharacters!=null)
artifactMap.put(EOS_**CHARACTERS_ENTRY_NAME, eosCharArrayToString(*
*eosCharacters));
checkArtifactMap();
}
the EOS-Char-Array is transformed to a string which is written to the
manifest.
Still, wenn serializing the model, I get:
Exception in thread "main" java.lang.**IllegalStateException: Missing
serializer for eosCharacters
Best,
Katrin
On 02/09/2012 12:48 PM, Joern Kottmann wrote:
The artifactMap map contains a manifest (that is a Properties object).
You should store the EOS chars in this manifest. We need a smart way to
convert
them into a String.
The Sentence Detector should retrieve the EOS chars then from the model
e.g. make a method getEosChars.
Have a look at the other model classes as well, e.g. the tokenizer model.
It stores some settings in the manifest. That is a good place to look for
a
code sample.
Jörn
On Thu, Feb 9, 2012 at 12:38 PM, Katrin Tomanek
<[email protected]>**wrote:
Hi,
I am moving the discussion on making the EOS characters of the sentence
splitter configurable to the dev list (it was previously on the user
list).
I am currently trying to make the EOS characters a parameter of the
SentenceDetectorME and store it as model parameter.
Thus far, this works fine (although it requires quite some positions in
the code to change).
I am putting a "char[] eosCharacters" to the artifactMap in
SentenceModel.
When predicting with a model, I test whether the eos parameter is set and
if so I use these eos symbols, otherwise the language dependent ones.
Anyways, I am now getting into troubles when serializing the model with
the new "char[]" parameter:
Writing sentence detector model ... Exception in thread "main"
java.lang.*
*IllegalStateException: Missing serializer for eosCharacters
I know that I would have to write such a serializer, however, I am a bit
lost here. Any hints (maybe there is already a serializer for char[]
which
I could easily use).
Best
Katrin
--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg
Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: [email protected]
Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080
--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg
Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: [email protected]
Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080