So I found simple example in sources:

WordTagSampleStreamTest.java, it parses string "This_x1 is_x2 a_x3 test_x4
sentence_x5 ._x6" using POSSample.

As I understand, with normal approach there are few steps for each language:
1. collect data for model
2. create POS dictionary like this:
<dictionary>
<entry tags="x1">
<token>This</token>
</entry>
<entry tags="x2">
<token>is</token>
</entry>
<entry tags="x3">
<token>a</token>
</entry>
...

3. learn model with this dictionary

Is it right approach? Is POS Tagger appropriate for this task?

Thanks in advance,
Yakov

On Tue, Aug 27, 2013 at 6:31 PM, Yakov Keranchuk
<[email protected]>wrote:

> Hi
>
> Is it possible to make tagging for tokens with own rules?
> Example: *The quick brown fox_animal jumps_action over the lazy dog_animal
> *
> *
> *
> Do we need to create custom dictionary for POS tagger?
> If it so can there be only one dictionary for a few languages?
>
> Best regards,
> Yakov
>

Reply via email to