As I am using the tika 2.7 server standard runnable jar package
and which has a built- in language detection feature I believe, do we need
to do any other configuration or need to install any other extension in
order to achieve language detection as mentioned below.

[image: image.png]

Please assist.
Thanks

On Fri, Apr 14, 2023, 22:05 Chetan Bikire <[email protected]> wrote:

> I too didn't find any metadata for language, but thought using tika
> language detector extension can be able to get it.
>
> org.apache.tika.language.detect.LanguageDetector
>
> On Wed, Apr 12, 2023, 22:38 Tim Allison <[email protected]> wrote:
>
>> I'm not seeing language hints in the document.xml within the docx nor
>> in the metadata.  Do you know where it might be stored inside the
>> docx?
>>
>> On Wed, Apr 12, 2023 at 1:01 PM Chetan Bikire <[email protected]>
>> wrote:
>> >
>> > I am calling tika using rmeta/text endpoint by running tika server 2.7.
>> > Yes, language detection means any metadata field which shows language
>> in which document is written.
>> > like for example- in our case attached document contains spanish
>> content in it then metadata Content-Language:"es"
>> >
>> >
>> >
>> > On Wed, Apr 12, 2023 at 8:32 PM Tim Allison <[email protected]>
>> wrote:
>> >>
>> >> How are you calling Tika?  By "language", do you mean language
>> >> detection on the extracted text or an internal metadata flag that says
>> >> "I'm X language"?
>> >>
>> >> On Wed, Apr 12, 2023 at 10:48 AM Chetan Bikire <[email protected]>
>> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > After parsing documents tika does not return language as part
>> parsing result for some of the documents like docx,.msg files.
>> >> > Below is the example document.
>> >> > please assist.
>>
>

Reply via email to