Get file metadata without retrieving entire file with Tika Server

Mr Havecamp Thu, 13 Oct 2016 08:30:19 -0700

A while back we contributed a workaround we had for extractingmetadata/content from remote urls. It wasn't the most ideal way tohandle extraction of remote files but it meant we could index full textfrom files stored on a completely different server from our JAXRS server.

We're now revisiting this functionality but the size of the files westore has increased; in some cases we are storing uncompressed videofiles. Currently, we have two options to extract metadata from these files:

1) is to start the JAXRS server with the enableFileUrl option in the new1.14 version and pass urls to Tika Server,

2) Using some kind of wrapper which downloads the file then sends thefile on to Tika Server for extraction.

However, the problem with either option is that we need to retrieve theentire file from storage; this is fine for smaller text files but whenhandling these larger files, it seems wasteful and time-consuming todownload, say, a video file just to extract the metadata information (wewouldn't be indexing the video content).

This is probably more of a question for the dev mailing list but Ithought I would start my research here to see if anyone has a)encountered a similar situation and possible b) has found a potentialsolution.


Thanks


Hayden

Get file metadata without retrieving entire file with Tika Server

Reply via email to