Hi Heidi,

line 157 in tokenize.perl is fine, the non-Unicode character appears in line 
3414186 in your input file, read from STDIN...

Check your input file for U+FDD3 :)

Cheers and best,
   Christian



On 15.07.2013, at 18:30, Heidi Heweidy wrote:

> Hello,
> I am using the English portion of the Multilingual UN Parallel Text 
> http://www.euromatrixplus.net/multi-un/
> As Moses works with only one file, I appended all files of one year and 
> removed the xml tags.
> Now, when trying to tokenize, I got this error after taking some time:
> Unicode non-character U+FDD3 is illegal for open interchange at 
> /home/tjr/mosesdecoder/scripts/tokenizer/tokenizer.perl line 157, <STDIN> 
> line 3414186.
> 
> I opened the tokenizer.perl and checked line 157 and it was the line having " 
>            print &tokenize($_); " in:
> 
> while(<STDIN>)
>     {
>         if (($SKIP_XML && /^<.+>$/) || /^\s*$/)
>         {
>             #don't try to tokenize XML/HTML tag lines
>             print $_;
>         }
>         else
>         {
>             print &tokenize($_);
>         }
>     }
> }
> 
> As for the <STDIN> line 3414186, I don't know how I can access that or what 
> the problem might be. Any help?
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support




--
Dipl.-Inf. Christian Federmann, Researcher, Language Technology Lab
Office +1.09 -- Phone +49-681/857-75-5353,  Fax +49-681/857-75-5338
DFKI GmbH,  Campus D3 2,  Stuhlsatzenhausweg 3,  66123 Saarbruecken
http://www.dfki.de/~cfedermann

-------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------------


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to