Re: Joshua 6.1

2016-10-14 Thread Tommaso Teofili
Hi Matt, thanks for pushing this forward, +1 from me. One concern I have is related to the language packs licensing, can we distribute them under AL2 license ? (as "convenience" binaries as the official release consists of the Joshua source code). I'm asking this because in OpenNLP we have had thi

Re: Joshua 6.1

2016-10-14 Thread Matt Post
I don't see why not? > On Oct 14, 2016, at 3:36 AM, Tommaso Teofili > wrote: > > Hi Matt, > > thanks for pushing this forward, +1 from me. > One concern I have is related to the language packs licensing, can we > distribute them under AL2 license ? (as "convenience" binaries as the > official

thrax bug with rarity penalty

2016-10-14 Thread Matt Post
Hi folks, There is a bug in Thrax related to floating point underflow and the computation of the rarity penalty. I'm training large models over Europarl and other datasets for the Spanish–English language pack, and in an attempt to filter the models down to the hundred most frequent candidates,

Re: thrax bug with rarity penalty

2016-10-14 Thread Matt Post
And by "very highly attested word pairs", I mean "any word pair with a count ≥ 15" (!). I am changing this to return 1 + Math.log(annotation.count()) and will commit this after testing. matt > On Oct 14, 2016, at 12:25 PM, Matt Post wrote: > > Hi folks, > > There is a bug in Thrax

Re: thrax bug with rarity penalty

2016-10-14 Thread Felix Hieber
Hi Matt, Good catch! If you go for 1 + log(count) [any reason for the '1 +'?] it probably shouldn't be called RarityPenalty anymore :) Cheers, Felix On Fri, 14 Oct 2016 at 18:34, Matt Post wrote: And by "very highly attested word pairs", I mean "any word pair with a count ≥ 15" (!). I am chang

Re: thrax bug with rarity penalty

2016-10-14 Thread Matt Post
On second thought, this isn't a bug. The penalty only penalizes low-count pairs, as designed. The problem is that I need rules counts, but I think the solution is to follow Moses route, and add those counts as a subsequent field. matt > On Oct 14, 2016, at 2:27 PM, Felix Hieber wrote: > > H