Hi Vinay, If you know which documents these are, it would be great to get hold of one of them.
Alternatively, it might be helpful to provide a thread dump of the ManifoldCF agents process once it's finished all the other documents and is stuck only on ones that are "hung" inside Tika. This should prompt a Tika bug report, and a document and a stack trace would be key for that. If you can give us a document and a corresponding stack trace, please create a ticket (https://issues.apache.org/jira), and attach the file to that. Thanks, Karl On Thu, May 24, 2018 at 2:11 AM VINAY Bengaluru <[email protected]> wrote: > Hi Karl, > We have Manifold CF 2.9.1 setup and job configured to do a > filesystem crawling followed by tika parser(Manifold CF one) and then > posting to Solr Cloud. > Though the crawling and indexing goes smoothly for most of the files, > there are a certain files including docs, pdfs which get hung at Tika > transformation stage. There is no errors in the logs and the History page > shows the file which is at tika parsing stage. > Any idea why the job hungs and doesn't come out of the tika transformation > stage? How could we handle such scenario as we setup a scheduled job for > continuous crawling? > > Thanks and regards, > Vinay B S > > > >
