Thanks Chris, good to know I'm on the right track.
I guess the caveat to below is that it does fetch the entire file so
only grabbing the file's metadata on large files (say a video) can take
a while.
I did attempt passing on the file's headers to the tika server:
curl -I "http://url/to/my.file" | curl -X PUT -T - http://myserver/tika/meta
and it does make an attempt to fetch the metadata but it results in very
little real metadata info:
"Content-Encoding","windows-1252"
"Content-Type","text/plain; charset=windows-1252"
(understandable as Tika Server is expecting the entire file to do its
magic).
In the meantime I'm using CURL to obtain the file metadata:
curl -I http://url/to/my.video
HTTP/1.1 200 OK
Date: Thu, 10 Oct 2013 04:01:15 GMT
Last-Modified: Thu, 10 Oct 2013 04:01:15 GMT
ETag: 1381377675619
Expires: Thu, 10 Oct 2013 04:11:15 GMT
Cache-Control: public
Cache-Control: max-age=600
Cache-Control: s-maxage=600
x-entity-prefix: bitstreams
x-entity-reference: /to/my.video
x-entity-url: /to/myfile.html
x-entity-format: html
x-sdata-handler: org.dspace.rest.providers.BitstreamProvider
x-sdata-url: /bitstreams/2416/download
Content-Disposition: attachment; filename=my.video
Content-Type: video/x-ms-wmv;charset=UTF-8
Content-Length: 243062358
then, if the Content-Type matches my preconfigured list of types I want
to extract, I make another run through using my tika server:
curl "http://url/to/my.file" | curl -X PUT -T - http://myserver/tika/meta
On 10/10/13 10:35, Chris Mattmann wrote:
Looks good to me! Excellent work and not sure I have
a better way atm..
------------------------
Chris Mattmann
[email protected]
-----Original Message-----
From: Mr Havercamp <[email protected]>
Reply-To: <[email protected]>
Date: Wednesday, October 9, 2013 7:27 PM
To: <[email protected]>
Subject: Re: Using TikaJAXRS with remote files
Success!
For anybody else interested:
curl "http://url/to/my.file" | curl -X PUT -T - http://myserver/tika/meta
However would be interested if anybody else has a different/more
efficient way of doing such an operation.
On 10/10/13 10:11, Mr Havercamp wrote:
Further to my previous post:
I can send remote files using a combination of the tika app running in
server mode, curl and nc:
java -jar tika-app-1.3.jar --server 1234
curl "http://url/to/my.file" | nc localhost 1234
So I guess now the only missing piece is being able to send remote
files to JAXRS for extraction.
On 10/10/13 07:50, Mr Havercamp wrote:
Hi
Been working with tika jaxrs and it is working great.
One thing I'm wondering; the standalone Tika app can extract remote
files by providing a url (both in GUI and CMD mode); I'm wondering if
the same is at all possible with TIKAJAXRS or TIka app launched in
server mode?
The reason being I may run an indexing client on a separate server so
it wouldn't necessarily have direct access to the file system where
the files to be indexed reside.
Cheers
Hayden