Greetings,

I am hoping this question is not too general i am really just looking for 
others experiences.

We are running one of the lastest versions in containers running in a 
Kubernetes cluster. Performance is not great. We are doing PDF conversion 
and generating a searchable pdf which is what takes the longest. 

~30 seconds per page.

Each pod/container has 2 cores and 4gb memory. 

We are experimenting with various configurations, cores, memory, and now 
threads based on my readings here and on github.

For those of your running in containers what are you setting your resources 
to? just on average looking for a range of answers likely.

We are planning on doing some testing as follows.

Tesseract Thread count 2

Pod Core Count 2

Pod Memory 2

Job Count 2

 

Tesseract Thread count 1

Pod core count 1

Pod Memory 2

Job Count 1

 

Tesseract Thread count 1

Pod core count 1

Pod Memory 1

Job count 1


Job count is how many concurrent documents it works on and are assigned by 
our orchestration engine.


I have not found much in the way of people posting the above. I think 30 
seconds is WAY too long per page so we are trying to optimize everything we 
can,


Thanks in advance for any insight.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/07b0af3a-3102-4a5b-9fec-572e52272ebb%40googlegroups.com.

Reply via email to