I see, so tika-app in server mode and tika-server are not the same thing. tika-app in server mode is just a way of providing an alternative input stream, but offers no control through that stream over what it actually does.
I have downloaded the tika-server and it works like a charm. The one thing I can't see how to do, is how to detect the language. The language is neither in the text nor in the metadata. Would I need to fetch the XHTML version of the document and get the language out of the header section? Not sure how to fetch the XHTML TBH - the documentation only covers plain text. -- Jason On 01/07/2012 13:34, Jukka Zitting wrote: > Hi, > > On Sun, Jul 1, 2012 at 1:28 PM, Jason Judge <[email protected]> wrote: >> Is it not tika-app running in server mode that I need? tika-server is only >> about 800kbytes in size, so could not possibly contain all the functionality >> that the 25Mbyte tika-app contains. > There's a more recent tika-server 1.2-SNAPSHOT version now that's 33MB > in size. That's the one you want. > > Note that currently both the tika-server jar and the --server option > of tika-app provide somewhat similar functionality. The tika-server > features are described in the wiki page Chris pointed to, while the > server mode of the tika-app simply parses documents sent through a > network connection programmatically or with a tool like netcat [1] and > responds with the parse output as governed by the rest of the tika-app > command line options. > > [1] http://netcat.sourceforge.net/ > > BR, > > Jukka Zitting
