Thank You, will look into this.

On Mon, Apr 17, 2023, 20:53 Tim Allison <[email protected]> wrote:

> Two options:
> 1) send the extracted text to the /language endpoint (
> https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-LanguageResource
> ).
> 2) If you are using the /rmeta endpoint or the json output from the /tika
> endpoint, you can get language id from a slightly different lang id
> mechanism via tika-eval.  Add the tika-eval.jar to your class path (see:
> https://cwiki.apache.org/confluence/display/TIKA/TikaServer 's section
> titled "Integration with tika-eval").
>
> On Mon, Apr 17, 2023 at 8:16 AM Chetan Bikire <[email protected]> wrote:
>
>> As I am using the tika 2.7 server standard runnable jar package
>> and which has a built- in language detection feature I believe, do we need
>> to do any other configuration or need to install any other extension in
>> order to achieve language detection as mentioned below.
>>
>> [image: image.png]
>>
>> Please assist.
>> Thanks
>>
>> On Fri, Apr 14, 2023, 22:05 Chetan Bikire <[email protected]> wrote:
>>
>>> I too didn't find any metadata for language, but thought using tika
>>> language detector extension can be able to get it.
>>>
>>> org.apache.tika.language.detect.LanguageDetector
>>>
>>> On Wed, Apr 12, 2023, 22:38 Tim Allison <[email protected]> wrote:
>>>
>>>> I'm not seeing language hints in the document.xml within the docx nor
>>>> in the metadata.  Do you know where it might be stored inside the
>>>> docx?
>>>>
>>>> On Wed, Apr 12, 2023 at 1:01 PM Chetan Bikire <[email protected]>
>>>> wrote:
>>>> >
>>>> > I am calling tika using rmeta/text endpoint by running tika server
>>>> 2.7.
>>>> > Yes, language detection means any metadata field which shows language
>>>> in which document is written.
>>>> > like for example- in our case attached document contains spanish
>>>> content in it then metadata Content-Language:"es"
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Apr 12, 2023 at 8:32 PM Tim Allison <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> How are you calling Tika?  By "language", do you mean language
>>>> >> detection on the extracted text or an internal metadata flag that
>>>> says
>>>> >> "I'm X language"?
>>>> >>
>>>> >> On Wed, Apr 12, 2023 at 10:48 AM Chetan Bikire <[email protected]>
>>>> wrote:
>>>> >> >
>>>> >> > Hi,
>>>> >> >
>>>> >> > After parsing documents tika does not return language as part
>>>> parsing result for some of the documents like docx,.msg files.
>>>> >> > Below is the example document.
>>>> >> > please assist.
>>>>
>>>

Reply via email to