Hi Chris

Thanks for your links, etc. I have successfully built and run Tika JAXRS and will look to incorporate it into my component so that users can configure and use it for Tika extraction (currently I have local Tika and SolrCell (Solr server). I think it is important to provide users with different options depending on their requirements (e.g. performance, simplicity, cost-effectiveness, etc).

Using Tika JAXRS I can very easily extract metadata which is great. I am also able to extract content as plain text but I cannot see a setting for returning content in xml/html. Is there a setting for this? Perhaps I'm missing something.

Cheers


Hayden

On 21/07/12 01:31, Mattmann, Chris A (388J) wrote:
Hi Hayden,

Thanks a ton! Yep I think TikaJAXRS will be a viable option for remote tika 
extraction.

Let me know how I can help.

Thanks much!

Cheers,
Chris

On Jul 20, 2012, at 10:13 AM, Mr Havercamp wrote:

Hi Chris

Thanks for the reply. I will check it out and let you know how I go.

I am developing an extension for Joomla which uses Solr and Tika to index 
content and attachments. I have three configuration options for users to select 
when specifying a method to extract content and metadata from files; a local 
install of the tika app, SolrCell, or a remote tika server. In your opinion, 
would TikaJAXRS be a viable option for remote tika extraction (for example, 
running on a separate server) especially in regards to performance and security?

Thanks again


Hayden

On 20/07/12 23:30, Mattmann, Chris A (388J) wrote:
Hi Hayden,

Thanks for your email! Have you tried the Tika JAXRS server, documented here:

https://issues.apache.org/jira/browse/TIKA-593
http://wiki.apache.org/tika/TikaJAXRS

It first appeared in 1.2 and can also be run on a port (9988 by default)
to handle cURL interactions.

Cheers,
Chris

On Jul 20, 2012, at 8:17 AM, Mr Havercamp wrote:

Have been playing around with integrating Tika into my PHP app.

I have had great success with Tika on the command line and also SolrCell.

However, I was wondering if there is some way of running Tika in server mode 
and extracting a document, say, via CURL.

I have had varying degrees of success with:

nc localhost 30000 < /opt/lampp/htdocs/joomla25/tmp/InformationRepository.pdf

but I'm wondering how I pass other params such as for extracting just metadata 
or content in html format.

Any help would be much appreciated.

Cheers


Hayden
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply via email to