I've looked at the code, and not being a Java programmer, I may be
misunderstanding it. But from what I can see, the tika server will only return
either the metadata as CSV, or the content as plain text. There are no other
formats supported - no metadata as JSON, or content as XHTML.
Is that correct?
So if I want to access all the same features through the server as I can get
through the command line (tiki-app), I would need to extend the tika-server
resources to add in new paths to serve as parameters? Extending the current
paths (/meta/whatever and /tika/foobar) is kind of blocked by treating anything
that follows as a keyword for the log files. Being able to use
/meta/{output-format} would have been nice.
Am I understanding it correctly that tika-server and tika-app are just two
examples of the way tika can be used, and are just thrown together as a
quick-start demo rather than core functionality of the main part of the project,
which is a collection of libraries and tools to be used by other java
applications.
It just feels strange that every way of access the functionality (server, app
CLI, GUI, app in server mode) has wildly different interfaces, with access to
different ranges of functionality, so I am guessing they have all been developed
interdependently as separate "demo interaction" layers rather than as different
ways to access a common set of functionality.
Would that be a fair appraisal? I'm just trying to get a grip, as an outsider,
on how the project is structured and the mindset behind how it all fits
together, so I have a better idea where to find answers and the best approaches
to use for integration.
Regards,
-- Jason
On 01/07/2012 14:05, Jason Judge wrote:
>
> The one thing I can't see how to do, is how to detect the language. The
> language is neither in the text nor in the metadata. Would I need to fetch the
> XHTML version of the document and get the language out of the header section?
> Not sure how to fetch the XHTML TBH - the documentation only covers plain
> text.
>
> -- Jason
>
> On 01/07/2012 13:34, Jukka Zitting wrote:
>
...