On 01/07/2012 17:31, Mattmann, Chris A (388J) wrote: > Hi Jason, > > On Jul 1, 2012, at 6:05 AM, Jason Judge wrote: > >> I see, so tika-app in server mode and tika-server are not the same thing. >> tika-app in server mode is just a way of providing an alternative input >> stream, but offers no control through that stream over what it actually does. >> >> I have downloaded the tika-server and it works like a charm. > Glad to hear it's working for ya! > >> The one thing I can't see how to do, is how to detect the language. The >> language is neither in the text nor in the metadata. Would I need to fetch >> the XHTML version of the document and get the language out of the header >> section? Not sure how to fetch the XHTML TBH - the documentation only covers >> plain text. > I don't think we added a language detection end point yet, but it's certainly > something we should do. > > In case we don't get to it as soon as you get a chance to, feel free to > contribute it back by: > > 1. filing an issue in our JIRA system at: > https://issues.apache.org/jira/browse/TIKA to record the desire for the > language detection end point > 2. submitting a patch and/or working with the committers on that issue you > create in #1. > > HTH! > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris,
I've raised https://issues.apache.org/jira/browse/TIKA-944 - hopefully not scoped too wide. -- Jason
