Am 05.10.2016 um 20:04 schrieb Nick Burch:
On Wed, 5 Oct 2016, Ingo Siebert wrote:
I just used Tika (org.apache.tika:tika-parsers:1.13) to parse an
e-mail with multipart/mixed content.
How do you want to get the various parts back? All text inlined, or a
special callback for each part? What about the metadata for the parts?
A MS Office document consists also of several parts and chapters and I
get them as one string.
The metadata are not interesting for me.
At least for my use-case I would be sufficient to get the data
concatenated into on string, but I would also be nice if I get the parts
separately.
The parsing result of Tika is the file in plain text including all
headers an boundary elements.
The words in the attachment are also not parsed.
Is this the defined behaviour of Tika?
It is if you don't tell Tika to recurse into embedded resources
Please can you give me a hint what I have to do to archive that?
Nick
Thank you for your answer.