On 01/07/2012 17:31, Mattmann, Chris A (388J) wrote:
> Hi Jason,
>
> On Jul 1, 2012, at 6:05 AM, Jason Judge wrote:
>
>> I see, so tika-app in server mode and tika-server are not the same thing. 
>> tika-app in server mode is just a way of providing an alternative input 
>> stream, but offers no control through that stream over what it actually does.
>>
>> I have downloaded the tika-server and it works like a charm.
> Glad to hear it's working for ya!
>
>> The one thing I can't see how to do, is how to detect the language. The 
>> language is neither in the text nor in the metadata. Would I need to fetch 
>> the XHTML version of the document and get the language out of the header 
>> section? Not sure how to fetch the XHTML TBH - the documentation only covers 
>> plain text.
> I don't think we added a language detection end point yet, but it's certainly
> something we should do.
>
> In case we don't get to it as soon as you get a chance to, feel free to 
> contribute it back by:
>
> 1. filing an issue in our JIRA system at: 
> https://issues.apache.org/jira/browse/TIKA to record the desire for the 
> language detection end point
> 2. submitting a patch and/or working with the committers on that issue you 
> create in #1.
>
> HTH!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
Chris,

I've raised https://issues.apache.org/jira/browse/TIKA-944 - hopefully not
scoped too wide.

-- Jason

Reply via email to