Matt Post created JOSHUA-284: -------------------------------- Summary: Phrase-based decoding changes Key: JOSHUA-284 URL: https://issues.apache.org/jira/browse/JOSHUA-284 Project: Joshua Issue Type: Bug Reporter: Matt Post Fix For: 6.1
Joshua's phrase-based decoding creates a lot of complications in the pipeline. Currently, phrase-based rules are simply left-branching Hiero rules. This means that, prior to packing or loading, rules have to have a nonterminal prepended to them. For example, Thrax will extract [X] ||| yo quiero ||| i want ||| ... This has to be changed to [X] ||| [X,1] yo quiero ||| [X,1] yo quiero ||| ... This means, for one, that phrase tables share a format but are specific to either the hiero or phrase-based decoder. A better idea would be to change the phrase-based decoder a bit so that, instead of using left-branching phrase rules, it made use of proper glue rules, the same way Hiero does. The advantages are: - both formalisms would use the same format - both formalisms would have a glue grammar - there should be no impact in running time -- This message was sent by Atlassian JIRA (v6.3.4#6332)