Thanks for the advice, will do own model

On Thu, May 23, 2013 at 5:25 PM, Jörn Kottmann <[email protected]> wrote:

> No there are no differences in your samples. Try to use capital USD
> instead of usd.
>
> The model was trained on English news text from the 90s try to give it
> some (old) news
> articles for testing.
>
> Jörn
>
>
> On 05/23/2013 03:16 PM, Яков Керанчук wrote:
>
>> Experimenting shows next results: only $ marked digits determined as money
>>
>> ------sentences: [The drop last week unwound most of the prior week's
>> jump,
>> suggesting employers were not laying off workers in response to tighter
>> fiscal policy, especially the $85 billion in across-the-board government
>> spending cuts that have dampened factory activity]
>> ------tokenizing
>> ------finding money
>> [[29..32) money]
>> [29..32) money
>> prepare model
>> ------sentences: [buy milk $2]
>> ------tokenizing
>> buy
>> milk
>> $
>> 2
>> ------finding money
>> [[2..4) money]
>> [2..4) money
>> ------pos tagging
>> VB
>> NN
>> $
>> CD
>> ------saving message to database
>> prepare model
>> ------sentences: [buy milk usd 2]
>> ------tokenizing
>> buy
>> milk
>> usd
>> 2
>> ------finding money
>> []
>> ------pos tagging
>> VB
>> NN
>> CD
>> ------saving message to database
>> prepare model
>> ------sentences: [Buy milk two Dollars]
>> ------tokenizing
>> Buy
>> milk
>> two
>> Dollars
>> ------finding money
>> []
>>
>> I have not noticed difference between SimpleTokenizer and TokenizerME in
>> this case
>>
>>
>> On Thu, May 23, 2013 at 5:00 PM, Jörn Kottmann <[email protected]>
>> wrote:
>>
>>  On 05/23/2013 02:56 PM, Яков Керанчук wrote:
>>>
>>>  Thanks for suggestion with own model, I'll try
>>>>
>>>> I use standard en-token.bin model, text contains mixed upper-lower case
>>>> words.
>>>>
>>>>  For the english model you should use the SimpleTokenizer, the token
>>> output
>>> from the en-token.bin model is not compatible with the training data.
>>>
>>> Jörn
>>>
>>>
>>
>>
>


-- 
Best regards,
Yakov Keranchuk
+79263768032

Reply via email to