Thanks Chris, good to know I'm on the right track.

I guess the caveat to below is that it does fetch the entire file so only grabbing the file's metadata on large files (say a video) can take a while.

I did attempt passing on the file's headers to the tika server:

curl -I "http://url/to/my.file"; | curl -X PUT -T - http://myserver/tika/meta

and it does make an attempt to fetch the metadata but it results in very little real metadata info:

"Content-Encoding","windows-1252"
"Content-Type","text/plain; charset=windows-1252"

(understandable as Tika Server is expecting the entire file to do its magic).

In the meantime I'm using CURL to obtain the file metadata:

curl -I http://url/to/my.video

HTTP/1.1 200 OK
Date: Thu, 10 Oct 2013 04:01:15 GMT
Last-Modified: Thu, 10 Oct 2013 04:01:15 GMT
ETag: 1381377675619
Expires: Thu, 10 Oct 2013 04:11:15 GMT
Cache-Control: public
Cache-Control: max-age=600
Cache-Control: s-maxage=600
x-entity-prefix: bitstreams
x-entity-reference: /to/my.video
x-entity-url: /to/myfile.html
x-entity-format: html
x-sdata-handler: org.dspace.rest.providers.BitstreamProvider
x-sdata-url: /bitstreams/2416/download
Content-Disposition: attachment; filename=my.video
Content-Type: video/x-ms-wmv;charset=UTF-8
Content-Length: 243062358

then, if the Content-Type matches my preconfigured list of types I want to extract, I make another run through using my tika server:

curl "http://url/to/my.file"; | curl -X PUT -T - http://myserver/tika/meta


On 10/10/13 10:35, Chris Mattmann wrote:
Looks good to me! Excellent work and not sure I have
a better way atm..

------------------------
Chris Mattmann
[email protected]




-----Original Message-----
From: Mr Havercamp <[email protected]>
Reply-To: <[email protected]>
Date: Wednesday, October 9, 2013 7:27 PM
To: <[email protected]>
Subject: Re: Using TikaJAXRS with remote files

Success!

For anybody else interested:

curl "http://url/to/my.file"; | curl -X PUT -T - http://myserver/tika/meta

However would be interested if anybody else has a different/more
efficient way of doing such an operation.

On 10/10/13 10:11, Mr Havercamp wrote:
Further to my previous post:

I can send remote files using a combination of the tika app running in
server mode, curl and nc:

java -jar tika-app-1.3.jar --server 1234

curl "http://url/to/my.file"; | nc localhost 1234

So I guess now the only missing piece is being able to send remote
files to JAXRS for extraction.

On 10/10/13 07:50, Mr Havercamp wrote:
Hi

Been working with tika jaxrs and it is working great.

One thing I'm wondering; the standalone Tika app can extract remote
files by providing a url (both in GUI and CMD mode); I'm wondering if
the same is at all possible with TIKAJAXRS or TIka app launched in
server mode?

The reason being I may run an indexing client on a separate server so
it wouldn't necessarily have direct access to the file system where
the files to be indexed reside.

Cheers


Hayden


Reply via email to