A while back we contributed a workaround we had for extracting metadata/content from remote urls. It wasn't the most ideal way to handle extraction of remote files but it meant we could index full text from files stored on a completely different server from our JAXRS server.

We're now revisiting this functionality but the size of the files we store has increased; in some cases we are storing uncompressed video files. Currently, we have two options to extract metadata from these files:


1) is to start the JAXRS server with the enableFileUrl option in the new 1.14 version and pass urls to Tika Server,

2) Using some kind of wrapper which downloads the file then sends the file on to Tika Server for extraction.

However, the problem with either option is that we need to retrieve the entire file from storage; this is fine for smaller text files but when handling these larger files, it seems wasteful and time-consuming to download, say, a video file just to extract the metadata information (we wouldn't be indexing the video content).

This is probably more of a question for the dev mailing list but I thought I would start my research here to see if anyone has a) encountered a similar situation and possible b) has found a potential solution.

Thanks


Hayden

Reply via email to