[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ] Sami Siren resolved NUTCH-145: ------------------------------
Fix Version: 0.8-dev Resolution: Fixed Assign To: Sami Siren this is now committed, thanks > build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM > ------------------------------------------------------------------- > > Key: NUTCH-145 > URL: http://issues.apache.org/jira/browse/NUTCH-145 > Project: Nutch > Type: Bug > Components: web gui > Versions: 0.8-dev > Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1 > Reporter: KuroSaka TeruHiko > Assignee: Sami Siren > Priority: Minor > Fix For: 0.8-dev > Attachments: NUTCH-145-fix.zip > > When I ran ant build from within Eclipse, it failed on > src/web/include/zh/header.xml and src/web/pages/zh/*.xml because "document > does not h ave a root element" (translated from Japanese message). > At a closer look at these files, they have an invisible Unicode UTF-8 BOM > character, that is EF BB BF in hex, or \357\273\277 in octal, at the > beginning. > Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. > (Note that BOM was orginially intended to be used to UTF-16 and UTF-32 > encodings to self-identify the endianness. But Microsoft started using > UTF-8-ized BOM as a character encoding signature.) > Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by > LF, unlike other ??/*.xml files which use UNIX style EOL. > Fixed files are available. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira