Hello Karl, I have check they files and our provider make a mistake in generating PDF for this server. We have null joined scan parameter.
We have similare errors with others server with no error log. I will also look. So, for this PDF error it’s ok, it’s just an error. For they other servers I check and I'm coming back towards you. Thanks, Maxence. De : Karl Wright [mailto:[email protected]] Envoyé : lundi 28 mai 2018 18:47 À : [email protected] Objet : Re: org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) error SPAM 10Go/hour This sounds potentially like a problem in Tika, but in order to be sure I would need a complete stack trace, not just a piece of one. If it is a Tika issue, it should appear reliably on the same document, again and again. Is there any way you can crawl ONLY one of the documents that got blocked? I suspect that when you paused and restarted, you just postponed the problem and it will happen again. Karl On Mon, May 28, 2018 at 9:50 AM msaunier <[email protected] <mailto:[email protected]> > wrote: Hello Karl, In Manifoldcf 2.9 for all jobs at the end of the job, several documents, around twenty, remain blocked. A single error appears and it spam the logs of several gigabytes in a short time which filled the servers : [?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495) ~[?:?] at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:231) ~[?:?] If I paused the job and start, documents are send and it working. But, if I’m not there, we have problems. Do you now this problem and do you have a solution ? It’s a bad configuration ? Thanks you.
