I'm not seeing language hints in the document.xml within the docx nor in the metadata. Do you know where it might be stored inside the docx?
On Wed, Apr 12, 2023 at 1:01 PM Chetan Bikire <[email protected]> wrote: > > I am calling tika using rmeta/text endpoint by running tika server 2.7. > Yes, language detection means any metadata field which shows language in > which document is written. > like for example- in our case attached document contains spanish content in > it then metadata Content-Language:"es" > > > > On Wed, Apr 12, 2023 at 8:32 PM Tim Allison <[email protected]> wrote: >> >> How are you calling Tika? By "language", do you mean language >> detection on the extracted text or an internal metadata flag that says >> "I'm X language"? >> >> On Wed, Apr 12, 2023 at 10:48 AM Chetan Bikire <[email protected]> wrote: >> > >> > Hi, >> > >> > After parsing documents tika does not return language as part parsing >> > result for some of the documents like docx,.msg files. >> > Below is the example document. >> > please assist.
