Hi,
I've tried exactly the same code in two scenarios :
Tika tika = new Tika();
Metadata metadata = new Metadata();
Reader reader = tika.parse(new File("..."));
FileWriter fw = new FileWriter(new File("..."));
int data = reader.read();
StringBuilder sb = new StringBuilder();
while (data != -1){
char dataChar = (char) data;
sb.append(dataChar);
fw.write(dataChar);
data = reader.read();
}
When I put this code in a simple Java project with tika-app-1.4.jar as a
dependency, it
generates UTF-8 output (correct).
When I put this code inside a bundle with *tika-bundle* and *tika-core* as
dependencies and deploy it
inside karaf, it generates ANSI output (blah).
Both projects are managed with maven and Eclipse 4.2.
Do I have to additionaly set something or should I embed tika-app inside my
bundle (using
maven-bundle-plugin)?
I'm using Tika 1.4, Java 1.6.45, Win 7 x64 and karaf 2.3.3.
--
Bratislav Stojanovic, M.Sc.