I'm using Tesseract (3.04.01 with leptonica-1.73) on Mac OS 10.12 to 
segment a clean screenshot of a web page. 

Here is the command:


    tesseract screen.png output.txt


screen.png:


[image: screen.png] 
<https://camo.githubusercontent.com/c82fb95cab29d3a05e1694ee5cd2b2365b60bbdf/68747470733a2f2f692e737461636b2e696d6775722e636f6d2f77667745692e706e67>


output.txt:


a CSS Regwstratmnfi x

C (D localnostr

Accoum Dexans

Eu a Pine: 5" a

Fifi/(‘3’ 22pm; J. , km?“ ”9

Persuna‘ Dexaus Funhev \muvmanun

«m s , (35‘ m Was :6 ms

FMS, Emms' (u v Jaruawy

*1: \(uax y ,

Chum

Terms and Mamng
m any ‘ ‘ Regwsley»

w lc‘asehe :avicxflza \zh»,:\':\e

Mm , (ism-ye I/Exzavheilédgémzéi


The output is complete garbage except for a few words like "Terms and". 

I've read the "ImproveQuality 
<https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality>" wiki, but 
I don't think any case applies to this image. 

Could anyone please tell me which command line options I should set to make 
it work? 


Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0aa8871c-393d-4bdf-bd73-673cfa10494d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to