Not totally on topic but I think related to this thread. I'm currently
exploring using tika as a library in Apache Spark. This approach suffers
the same problems as using Tika as library mentioned above. Has anyone used
Tika as a library in a Spark Job? Or would it still make sense to us
something external like tika-server? That seems like it might be counter to
the point of using Spark in the first place.

On Tue, Nov 24, 2020 at 10:46 AM Slava G <[email protected]> wrote:

> We have been using tika as java library, for a few years now and parsing
> millions of different files each day. And we're switching now to tika
> server as bugs in different tika components (dependencies) caused issue
> like exit of the jvm, memory issues and so. Also, tika and it's different
> dependencies bringa lot of other dependencies, so it should simply the
> maintainability and reduce JAR hell.
>
> So, this is our road from tika as java library to tika as a server 😀
>
> Thanks
>
> On Tue, Nov 24, 2020, 09:28 Ralph Soika <[email protected]> wrote:
>
>> Hi Robert,
>>
>> in the sense of a microservice architecture it makes absolute sense to
>> use Tika as a server/microservice component. As Tim Allison explained this
>> helps you to separate your business requirements in isolated components
>> (running in there own JVM).
>>
>> If you don't need to link the Tika function closely to your code then use
>> the server option wherever possible.
>>
>>
>> Best regards
>>
>> Ralph
>>
>>
>> On 23.11.20 21:36, Robert Raines wrote:
>>
>> Hi,
>>
>> I am using Tika to extract text from Word Docs and PDFs locally. It's
>> great. Thank you Apache and Tika developers!
>>
>> Could someone help me understand why Tika offers a client-server option
>> instead of just a code library? I am sure there was/is a good reason, so I
>> am curious if anyone knows or if there are some resources that explain the
>> history of how/why Tika also has its API architecture.
>>
>> Thanks so much,
>> Robert
>>
>>
>>
>> --
>>
>> *Imixs Software Solutions GmbH*
>> *Web:* www.imixs.com *Phone:* +49 (0)89-452136 16
>> *Office:* Agnes-Pockels-Bogen 1, 80992 München
>> Registergericht: Amtsgericht Muenchen, HRB 136045
>> Geschaeftsführer: Gaby Heinle u. Ralph Soika
>>
>> *Imixs* is an open source company, read more: www.imixs.org
>>
>

Reply via email to