Not totally on topic but I think related to this thread. I'm currently exploring using tika as a library in Apache Spark. This approach suffers the same problems as using Tika as library mentioned above. Has anyone used Tika as a library in a Spark Job? Or would it still make sense to us something external like tika-server? That seems like it might be counter to the point of using Spark in the first place.
On Tue, Nov 24, 2020 at 10:46 AM Slava G <[email protected]> wrote: > We have been using tika as java library, for a few years now and parsing > millions of different files each day. And we're switching now to tika > server as bugs in different tika components (dependencies) caused issue > like exit of the jvm, memory issues and so. Also, tika and it's different > dependencies bringa lot of other dependencies, so it should simply the > maintainability and reduce JAR hell. > > So, this is our road from tika as java library to tika as a server 😀 > > Thanks > > On Tue, Nov 24, 2020, 09:28 Ralph Soika <[email protected]> wrote: > >> Hi Robert, >> >> in the sense of a microservice architecture it makes absolute sense to >> use Tika as a server/microservice component. As Tim Allison explained this >> helps you to separate your business requirements in isolated components >> (running in there own JVM). >> >> If you don't need to link the Tika function closely to your code then use >> the server option wherever possible. >> >> >> Best regards >> >> Ralph >> >> >> On 23.11.20 21:36, Robert Raines wrote: >> >> Hi, >> >> I am using Tika to extract text from Word Docs and PDFs locally. It's >> great. Thank you Apache and Tika developers! >> >> Could someone help me understand why Tika offers a client-server option >> instead of just a code library? I am sure there was/is a good reason, so I >> am curious if anyone knows or if there are some resources that explain the >> history of how/why Tika also has its API architecture. >> >> Thanks so much, >> Robert >> >> >> >> -- >> >> *Imixs Software Solutions GmbH* >> *Web:* www.imixs.com *Phone:* +49 (0)89-452136 16 >> *Office:* Agnes-Pockels-Bogen 1, 80992 München >> Registergericht: Amtsgericht Muenchen, HRB 136045 >> Geschaeftsführer: Gaby Heinle u. Ralph Soika >> >> *Imixs* is an open source company, read more: www.imixs.org >> >
