Hi, We're using /tika Docker endpoint with text/plain to extract file content for indexing in Elastic.
If I have a 10 Mb .msg file with 10 .docx and PDF attachments. I only need to extract the text from the .msg body, not any of the attachments, as these are extracted from the .msg and handled separately. It now times out since it's massive amounts of text to process. I can't find any good examples for this for /tika, even on the excellent wiki at https://cwiki.apache.org/confluence/display/TIKA/TikaServer Everything I try results in the attachments being part of the file extraction output. I see there is a POST to /tika/form/main which sounds promising, but I can't get that to work. Using /rmeta does it as JSON/html, but we ideally only need the file content as plain text. Any ideas would be greatly appreciated! Regards, Willy Koch
