I am calling tika using rmeta/text endpoint by running tika server 2.7. Yes, language detection means any metadata field which shows language in which document is written. like for example- in our case attached document contains spanish content in it then metadata Content-Language:"es"
On Wed, Apr 12, 2023 at 8:32 PM Tim Allison <[email protected]> wrote: > How are you calling Tika? By "language", do you mean language > detection on the extracted text or an internal metadata flag that says > "I'm X language"? > > On Wed, Apr 12, 2023 at 10:48 AM Chetan Bikire <[email protected]> > wrote: > > > > Hi, > > > > After parsing documents tika does not return language as part parsing > result for some of the documents like docx,.msg files. > > Below is the example document. > > please assist. >
