A while back we contributed a workaround we had for extracting
metadata/content from remote urls. It wasn't the most ideal way to
handle extraction of remote files but it meant we could index full text
from files stored on a completely different server from our JAXRS server.
We're now revisiting this functionality but the size of the files we
store has increased; in some cases we are storing uncompressed video
files. Currently, we have two options to extract metadata from these files:
1) is to start the JAXRS server with the enableFileUrl option in the new
1.14 version and pass urls to Tika Server,
2) Using some kind of wrapper which downloads the file then sends the
file on to Tika Server for extraction.
However, the problem with either option is that we need to retrieve the
entire file from storage; this is fine for smaller text files but when
handling these larger files, it seems wasteful and time-consuming to
download, say, a video file just to extract the metadata information (we
wouldn't be indexing the video content).
This is probably more of a question for the dev mailing list but I
thought I would start my research here to see if anyone has a)
encountered a similar situation and possible b) has found a potential
solution.
Thanks
Hayden
- Get file metadata without retrieving entire file with Tika Ser... Mr Havecamp
-