Re: Lucene Unicode Usage

2005-02-10 Thread Andrzej Bialecki
Owen Densmore wrote: I'm building an index from a FileMaker database by dumping the data to a tab-separated file. Because the FileMaker output is encoded in MacRoman, and uses Mac line separators, I run a script across the tab file to clean it up: tr '\r\v' '\n ' | iconv -f MAC -t UTF-8

Lucene Unicode Usage

2005-02-09 Thread Owen Densmore
I'm building an index from a FileMaker database by dumping the data to a tab-separated file. Because the FileMaker output is encoded in MacRoman, and uses Mac line separators, I run a script across the tab file to clean it up: tr '\r\v' '\n ' | iconv -f MAC -t UTF-8 This basically converts

Re: Lucene Unicode Usage

2005-02-09 Thread aurora
So you got a utf8 encoded text file. But how do you read the file into Java? The default encoding of Java is likely to be something other than utf8. Make sure you specify the encoding like: InputStreamReader( new FileInputStream(filename), UTF-8); On Wed, 9 Feb 2005 22:32:38 -0700, Owen