Hello Karl,

I have check they files and our provider make a mistake in generating PDF for 
this server. We have null joined scan parameter.

We have similare errors with others server with no error log. I will also look.

 

So, for this PDF error it’s ok, it’s just an error.

For they other servers I check and I'm coming back towards you.

 

Thanks,

Maxence.

 

 

De : Karl Wright [mailto:[email protected]] 
Envoyé : lundi 28 mai 2018 18:47
À : [email protected]
Objet : Re: 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
 error SPAM 10Go/hour

 

This sounds potentially like a problem in Tika, but in order to be sure I would 
need a complete stack trace, not just a piece of one.

If it is a Tika issue, it should appear reliably on the same document, again 
and again.

 

Is there any way you can crawl ONLY one of the documents that got blocked?  I 
suspect that when you paused and restarted, you just postponed the problem and 
it will happen again.

 

Karl

 

 

On Mon, May 28, 2018 at 9:50 AM msaunier <[email protected] 
<mailto:[email protected]> > wrote:

Hello Karl,

 

In Manifoldcf 2.9 for all jobs at the end of the job, several documents, around 
twenty, remain blocked.

A single error appears and it spam the logs of several gigabytes in a short 
time which filled the servers :

 

[?:?]

               at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
 ~[?:?]

               at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
 ~[?:?]

               at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:231)
 ~[?:?]

 

If I paused the job and start, documents are send and it working. But, if I’m 
not there, we have problems.

 

Do you now this problem and do you have a solution ? It’s a bad configuration ?

 

Thanks you.

Reply via email to