The CHANGES.txt document of Tika 1.2 mentions that *Tika now returns the detected character encoding as* *a "charset" parameter of the content type metadata field for text/plain* *and text/html documents. For example, instead of just "text/plain", the* *returned content type will be something like "text/plain; charset=UTF-8"* *for a UTF-8 encoded text document.*
However, when parsing a set of plain text (ASCII) files (some IETF RFCs), the return type is still just "text/plain", without any charset information. The code I am using to detect the content of each file is something like: Tika tika = new Tika(); InputStream is = TikaInputStream.get(new FileInputStream(file)); System.out.println(tika.detect(is)); and the output is still "text/plain", as per previous versions of Tika. Should that be the case?
