hello, i have recently posted some messages on iso-8859-2 encoding problems.
trying to solve that problem I encoded the latin2 xml document as UTF-8 and did an AddDocument to xindice. the behaviour is similar: the characters which happen to be in the iso-8859-1 (è, â, î) are alright. the ones that are specific to 8859-2 are replaced by "?". this happens in the very file where XIndice holds its database. this is probably caused by opening a Writer somewhere in the I/O part of XIndice (i have not found yet the code which actually does this ) without specifying an encoding. as the default encoding is usually iso-8859-1, the latin2 texts are improperly handled. indeed, a solution is changing the file.encoding property for Java. for instance, if i call java this way: java -Dfile.encoding=utf-8 the problem disappears: the latin2 text is stored as utf-8 in the xindice db, which is ok for me. I wonder it would not be more proper to allow the user to choose the encoding in which his text will be stored, and do something like: Writer writer = new ...Writer(outputStream, "my-encoding-here") in the I/O code of XIndice. or, even better, look at the <?xml version=1.0 encoding="my-encoding-here" ?> and use the given encoding when storing the document into XIndice. otherwise, the majority will use, without knowing, the default encodings of their machines. best regards, adrian.