I see, so tika-app in server mode and tika-server are not the same thing.
tika-app in server mode is just a way of providing an alternative input stream,
but offers no control through that stream over what it actually does.

I have downloaded the tika-server and it works like a charm.

The one thing I can't see how to do, is how to detect the language. The language
is neither in the text nor in the metadata. Would I need to fetch the XHTML
version of the document and get the language out of the header section? Not sure
how to fetch the XHTML TBH - the documentation only covers plain text.

-- Jason

On 01/07/2012 13:34, Jukka Zitting wrote:
> Hi,
>
> On Sun, Jul 1, 2012 at 1:28 PM, Jason Judge <[email protected]> wrote:
>> Is it not tika-app running in server mode that I need? tika-server is only
>> about 800kbytes in size, so could not possibly contain all the functionality
>> that the 25Mbyte tika-app contains.
> There's a more recent tika-server 1.2-SNAPSHOT version now that's 33MB
> in size. That's the one you want.
>
> Note that currently both the tika-server jar and the --server option
> of tika-app provide somewhat similar functionality. The tika-server
> features are described in the wiki page Chris pointed to, while the
> server mode of the tika-app simply parses documents sent through a
> network connection programmatically or with a tool like netcat [1] and
> responds with the parse output as governed by the rest of the tika-app
> command line options.
>
> [1] http://netcat.sourceforge.net/
>
> BR,
>
> Jukka Zitting


Reply via email to