Hayden, I am developing a small PHP library to drive the command line version of Tika to perform a variety of functions. The library handles input and output files, tidying them up when finished, and delivers data in files or open streams.
I'm doing this primarily for a project to analyse uploaded CVS, but also for getting into PSR-0 so it can be used on a variety of projects. If you are interested, I can send you what I have got so far. The hope was that using the library can be agnostic to how it accesses Tika - whether command line, server or even something like java-php-bridge, but what I have found so far is that each of these access methods are inconsistent, i.e. offer different features. There is stuff you can do from the command line that you can't do from the server mode and vice-versa (I've raised a ticket on this). I think a Java/PHP bridge would be best, but I have absolutely no experience in Java servers and setting up custom Java applications, and it's a steep learning curve to get into. But anyway, the ultimate aim is to get a portable PHP library that can use the features of Tika in a consistent way, and perhaps use drivers so that whatever method of accessing Tika is available, could be used. -- Jason [email protected] <mailto:[email protected]> www.consil.co.uk <http://www.consil.co.uk/> On 20/07/2012 18:13, Mr Havercamp wrote: > Hi Chris > > Thanks for the reply. I will check it out and let you know how I go. > > I am developing an extension for Joomla which uses Solr and Tika to index > content and attachments. I have three configuration options for users to > select when specifying a method to extract content and metadata from files; a > local install of the tika app, SolrCell, or a remote tika server. In your > opinion, would TikaJAXRS be a viable option for remote tika extraction (for > example, running on a separate server) especially in regards to performance > and security? > > Thanks again > > > Hayden > > On 20/07/12 23:30, Mattmann, Chris A (388J) wrote: >> Hi Hayden, >> >> Thanks for your email! Have you tried the Tika JAXRS server, documented here: >> >> https://issues.apache.org/jira/browse/TIKA-593 >> http://wiki.apache.org/tika/TikaJAXRS >> >> It first appeared in 1.2 and can also be run on a port (9988 by default) >> to handle cURL interactions. >> >> Cheers, >> Chris >> >> On Jul 20, 2012, at 8:17 AM, Mr Havercamp wrote: >> >>> Have been playing around with integrating Tika into my PHP app. >>> >>> I have had great success with Tika on the command line and also SolrCell. >>> >>> However, I was wondering if there is some way of running Tika in server mode >>> and extracting a document, say, via CURL. >>> >>> I have had varying degrees of success with: >>> >>> nc localhost 30000 < >>> /opt/lampp/htdocs/joomla25/tmp/InformationRepository.pdf >>> >>> but I'm wondering how I pass other params such as for extracting just >>> metadata or content in html format. >>> >>> Any help would be much appreciated. >>> >>> Cheers >>> >>> >>> Hayden >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >
