I am not an expert on mime types and how they extend. My definition of binary is any file that is not in human readable form. Any other file, I'd like to index. Would that answer your question?
On Thu, Jan 11, 2018 at 10:01 AM Nick Burch <[email protected]> wrote: > On Thu, 11 Jan 2018, Kudrettin Güleryüz wrote: > > Does Tika library provide an efficient binary file check? > > How do you define "binary"? > > Only things with a mimetype that starts text/ ? Or do you want to include > application/xml files? Or things that extend form XML like DIF and > FictionBook? Only things that contain ascii-printable characters? Other? > > We need to know your definition of binary to be able to suggest! > > Nick
