Using Tika to read only the beginning of binary resources?

Jean Luc Tue, 21 Dec 2010 11:35:51 -0800

I need to build a search engine for metadata stored in binary resources, so
Nutch & Tika are my first choice.


What I hope to find out before plunging too deep is whether it's possible
for Nutch/Tika to only process part of the file, in practice the header.
What "header" is depends on the actual resource type, in all my cases,
however, it's about the first few KB in a file that can be much larger. I'm
not going to store the actual resource so I don't need to read it in its
entirety. Therefore, it makes sense to only read the first few K. Is this
possible with Tika?

Thanks,
JL

Using Tika to read only the beginning of binary resources?

Reply via email to