Hi, Playing around with the CLI tool. I tried to detect the language of the test files, e.g. the en.test located at tika/tika-core/src/test/resources/org/apache/tika/language/en.test
lap:tika janhoy$ java -jar tika-app/target/tika-app-0.8-SNAPSHOT.jar -m tika-core/src/test/resources/org/apache/tika/language/en.test Content-Encoding: UTF-8 Content-Length: 22427 Content-Type: text/plain resourceName: en.test As you can see, no language was detected. Now I make a copy and converts it from UTF-8 to ISO-8859-1 and try again: lap:tika janhoy$ iconv -f UTF-8 -t ISO-8859-1 <tika-core/src/test/resources/org/apache/tika/language/en.test >en-iso.txt lap:tika janhoy$ java -jar tika-app/target/tika-app-0.8-SNAPSHOT.jar -m en-iso.txt Content-Encoding: ISO-8859-1 Content-Language: en Content-Length: 22417 Content-Type: text/plain language: en resourceName: en-iso.txt Detected as english. The same is true for the other test language files. It does not detect language for UTF-8 encoded files. Does anyone see what's wrong? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com
