I guess the question is how far do we want to bake this in?  I could
see adding a field for the default extension in the
CompositeDetector/DefaultDetector.  This would then be triggered on
embedded files, too.  I can't imagine this would add much cost
computationally(???), and it would just show up for free all over the
place.

It does feel a bit smelly to add this one feature, but I've done worse
in my career. :(

Or, do we want a custom handler/parameter on the detect/ endpoint in
tika-server?

Is the use case that you want to parse the file _and_ get this
information in one go?  Or, are you only running detect on the
main/container file?

On Thu, Feb 17, 2022 at 2:00 PM Nick Burch <[email protected]> wrote:
>
> On Thu, 10 Feb 2022, Nick Burch wrote:
> > On Thu, 10 Feb 2022, Willy T. Koch wrote:
> >> …and calling it as a webservice with Postman/curl.
> >
> > Ah, I think we might not be exposing the full details of the mime types via
> > the server, only details of their parsers and the heirarchy, eg
> > http://localhost:9998/mime-types#audio/vorbis
> >
> > (We have that info in Java we're just seemingly not making it available)
> >
> >
> > I'm not sure about exposing all the details of all the types by default,
> > but adding a flag and/or a sub-endpoint that would return the full
> > details of a type, including extensions and comments etc, seems OK to
> > me. Thoughts anyone?
>
> Tika devs - any thoughts on this? It's a pretty small code change (we
> already have the data on the mime type!), just need feedback on extending
> the existing API vs adding a new one
>
> Nick

Reply via email to