[Moses-support] cleaning the corpus prunes entire dataset

2014-12-04 Thread Jaya Kumaran
Hi, When I run clean-corpus-n.perl with max-1000 on the dataset with 14k(tourism corpus) lines, I get only 2.5k lines as clean corpus. I see the script in addition to removing blank lines, and lines >1000(max) words, the script is removing lines which violates 9-1 sentence ratio of Giza. I don

Re: [Moses-support] How to train a tree-to-tree model?

2014-12-04 Thread Rico Sennrich
Steven Huang writes: > > It seems that the XML is not correctly paresed and is taken as plain text. > Is there anything wrong with my training configuration or training corpus? > Thanks a lot. Hi Steven, The Moses XML format isn't pure and still cares about white space. Each sentence should be

[Moses-support] cfp for 20th International Conference on Application of Natural Language to Information Systems (NLDB'15)

2014-12-04 Thread Michael Zock
** apologies for cross-posting ** Call for Papers: 20th International Conference on Application of Natural Language to Information Systems (NLDB'15) Conference website: http://nldb2015.org/ NLDB 2015 invites researchers from academia and industry to submit papers for oral or poster pres