Hi Matt,
thanks for pushing this forward, +1 from me.
One concern I have is related to the language packs licensing, can we
distribute them under AL2 license ? (as "convenience" binaries as the
official release consists of the Joshua source code).
I'm asking this because in OpenNLP we have had thi
I don't see why not?
> On Oct 14, 2016, at 3:36 AM, Tommaso Teofili
> wrote:
>
> Hi Matt,
>
> thanks for pushing this forward, +1 from me.
> One concern I have is related to the language packs licensing, can we
> distribute them under AL2 license ? (as "convenience" binaries as the
> official
Hi folks,
There is a bug in Thrax related to floating point underflow and the computation
of the rarity penalty. I'm training large models over Europarl and other
datasets for the Spanish–English language pack, and in an attempt to filter the
models down to the hundred most frequent candidates,
And by "very highly attested word pairs", I mean "any word pair with a count ≥
15" (!).
I am changing this to return
1 + Math.log(annotation.count())
and will commit this after testing.
matt
> On Oct 14, 2016, at 12:25 PM, Matt Post wrote:
>
> Hi folks,
>
> There is a bug in Thrax
Hi Matt,
Good catch! If you go for 1 + log(count) [any reason for the '1 +'?] it
probably shouldn't be called RarityPenalty anymore :)
Cheers,
Felix
On Fri, 14 Oct 2016 at 18:34, Matt Post wrote:
And by "very highly attested word pairs", I mean "any word pair with a
count ≥ 15" (!).
I am chang
On second thought, this isn't a bug. The penalty only penalizes low-count
pairs, as designed.
The problem is that I need rules counts, but I think the solution is to follow
Moses route, and add those counts as a subsequent field.
matt
> On Oct 14, 2016, at 2:27 PM, Felix Hieber wrote:
>
> H