Hi Heidi, line 157 in tokenize.perl is fine, the non-Unicode character appears in line 3414186 in your input file, read from STDIN...
Check your input file for U+FDD3 :) Cheers and best, Christian On 15.07.2013, at 18:30, Heidi Heweidy wrote: > Hello, > I am using the English portion of the Multilingual UN Parallel Text > http://www.euromatrixplus.net/multi-un/ > As Moses works with only one file, I appended all files of one year and > removed the xml tags. > Now, when trying to tokenize, I got this error after taking some time: > Unicode non-character U+FDD3 is illegal for open interchange at > /home/tjr/mosesdecoder/scripts/tokenizer/tokenizer.perl line 157, <STDIN> > line 3414186. > > I opened the tokenizer.perl and checked line 157 and it was the line having " > print &tokenize($_); " in: > > while(<STDIN>) > { > if (($SKIP_XML && /^<.+>$/) || /^\s*$/) > { > #don't try to tokenize XML/HTML tag lines > print $_; > } > else > { > print &tokenize($_); > } > } > } > > As for the <STDIN> line 3414186, I don't know how I can access that or what > the problem might be. Any help? > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support -- Dipl.-Inf. Christian Federmann, Researcher, Language Technology Lab Office +1.09 -- Phone +49-681/857-75-5353, Fax +49-681/857-75-5338 DFKI GmbH, Campus D3 2, Stuhlsatzenhausweg 3, 66123 Saarbruecken http://www.dfki.de/~cfedermann ------------------------------------------------------------------- Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 ------------------------------------------------------------------- _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support