Great.  I opened TIKA-2514 to track this.  Pull requests are welcomed! 😊

-----Original Message-----
From: Jim Idle [mailto:[email protected]] 
Sent: Wednesday, November 29, 2017 8:58 PM
To: [email protected]
Subject: RE: Very slow parsing of a few PDF files

That would be a more practical alternative. I have time scheduled next week for 
an in-house solution but I will first look properly at ForkParser and see if I 
could make something akin to that in generic and configurable fashion. If so, I 
will submit the code.

Jim 

> -----Original Message-----
> From: Allison, Timothy B. [mailto:[email protected]]
> Sent: Wednesday, November 29, 2017 23:52
> To: [email protected]
> Subject: RE: Very slow parsing of a few PDF files
> 
> >I am going to have to write my own application specific solution
> 
> Ugh.  I'm sorry.  If there's anything shareable, please do share.
> 
> > ForkParser tries to serialize every class it things will be needed 
> > across the
> connection and a lot of third party classes are not serializable. I 
> think that ForkParser is a good enough idea but I am not sure how 
> practical it is in a real-life application.
> 
> You make a very good point.  We've had issues serializing our own 
> parsers...let alone user-specific addons.  I wonder if we could modify 
> ForkClient to kick off the forkserver process from a user-specified "bin"
> directory (instead of the current bootstrapped jar), and that bin 
> directory could include at least the tika-core.jar, 
> tika-fat-parsers.jar and tika- serialization.jar but could also 
> include optional dependencies and user- specific dependencies.
> 
> Hmmm....

Reply via email to