Re: Returning file extension alongside mime-type?

2022-03-11 Thread Nick Burch
On Tue, 8 Mar 2022, Willy T. Koch wrote: That’s fantastic, thank you! Looking forward to testing when the Tika Docker repo is updated with this release. That may take a few weeks, but if you don't mind building Tika from source, you should be able to give it a whirl now. (As far as I'm aware

Re: Returning file extension alongside mime-type?

2022-03-08 Thread Willy T. Koch
That’s fantastic, thank you! Looking forward to testing when the Tika Docker repo is updated with this release. Regards, Willy T. Koch Den Tir 8 mar 2022, kl. 00:35, skrev Nick Burch: > On Fri, 18 Feb 2022, Willy T. Koch wrote: > > Den Tor 17 feb 2022, kl. 20:00, skrev Nick Burch: > >> Tika

Re: Returning file extension alongside mime-type?

2022-03-07 Thread Nick Burch
On Fri, 18 Feb 2022, Willy T. Koch wrote: Den Tor 17 feb 2022, kl. 20:00, skrev Nick Burch: Tika devs - any thoughts on this? It's a pretty small code change (we already have the data on the mime type!), just need feedback on extending the existing API vs adding a new one By also returning the

Re: Returning file extension alongside mime-type?

2022-02-24 Thread Nick Burch
On Thu, 24 Feb 2022, Tim Allison wrote: A separate endpoint, then? That would be cleaner. We already have some mime details related endpoints, would be an extension or related endpoint to those, see earlier-thread: https://lists.apache.org/thread/jlym8ypnrj978hmzjgvkc1fpxnc7g51h Nick

Re: Returning file extension alongside mime-type?

2022-02-24 Thread Tim Allison
A separate endpoint, then? That would be cleaner. On Thu, Feb 24, 2022 at 6:31 AM Nick Burch wrote: > > On Tue, 22 Feb 2022, Tim Allison wrote: > > I guess the question is how far do we want to bake this in? I could see > > adding a field for the default extension in the > > CompositeDetector/D

Re: Returning file extension alongside mime-type?

2022-02-24 Thread Nick Burch
On Tue, 22 Feb 2022, Tim Allison wrote: I guess the question is how far do we want to bake this in? I could see adding a field for the default extension in the CompositeDetector/DefaultDetector. This would then be triggered on embedded files, too. I can't imagine this would add much cost co

Re: Returning file extension alongside mime-type?

2022-02-22 Thread Tim Allison
I guess the question is how far do we want to bake this in? I could see adding a field for the default extension in the CompositeDetector/DefaultDetector. This would then be triggered on embedded files, too. I can't imagine this would add much cost computationally(???), and it would just show up

Re: Returning file extension alongside mime-type?

2022-02-18 Thread Willy T. Koch
Den Tor 17 feb 2022, kl. 20:00, skrev Nick Burch: > On Thu, 10 Feb 2022, Nick Burch wrote: > > On Thu, 10 Feb 2022, Willy T. Koch wrote: > >> …and calling it as a webservice with Postman/curl. > > > > Ah, I think we might not be exposing the full details of the mime types via > > the server, only

Re: Returning file extension alongside mime-type?

2022-02-17 Thread Nick Burch
On Thu, 10 Feb 2022, Nick Burch wrote: On Thu, 10 Feb 2022, Willy T. Koch wrote: …and calling it as a webservice with Postman/curl. Ah, I think we might not be exposing the full details of the mime types via the server, only details of their parsers and the heirarchy, eg http://localhost:999

Re: Returning file extension alongside mime-type?

2022-02-10 Thread Nick Burch
On Thu, 10 Feb 2022, Willy T. Koch wrote: …and calling it as a webservice with Postman/curl. Ah, I think we might not be exposing the full details of the mime types via the server, only details of their parsers and the heirarchy, eg http://localhost:9998/mime-types#audio/vorbis (We have that

Re: Returning file extension alongside mime-type?

2022-02-10 Thread Willy T. Koch
…and calling it as a webservice with Postman/curl. Willy Den Tor 10 feb 2022, kl. 22:43, skrev Willy T. Koch: > Ah, that’s good news, will look into that! > > I’ve only been using the 2.2.1-full official Tika docker image with default > config, only added some more Tesseract languages for OCR.

Re: Returning file extension alongside mime-type?

2022-02-10 Thread Willy T. Koch
Ah, that’s good news, will look into that! I’ve only been using the 2.2.1-full official Tika docker image with default config, only added some more Tesseract languages for OCR. Vennlig hilsen Willy T. Koch t...@kochkonsult.no Mob: +47 480 321 77 Den Tor 10 feb 2022, kl. 22:40, skrev Nick Bur

Re: Returning file extension alongside mime-type?

2022-02-10 Thread Nick Burch
On Thu, 10 Feb 2022, Willy T. Koch wrote: As for content detection, today the content-type field with mime type is returned. What we would need is a mime-type to file extension lookup and it seems logical that this was also returned by Tika. How are you calling Tika? We already have APIs for t

Returning file extension alongside mime-type?

2022-02-10 Thread Willy T. Koch
Hi, Newly Tika user here. Really impressed by the Tika toolkit and we’re planning to use it as a Docker service in our case management solution used by the public sector in the Nordics, for many different use cases. As for content detection, today the content-type field with mime type is return