On Thu, Jan 8, 2009 at 10:07 AM, Darren Govoni <[email protected]> wrote:
>
> Hi,
>  I read tesseract supports a variety of languages. I convert a Spanish
> text JPG to TIFF and ran tesseract with the spanish language pack and
> the output text was not even close. Here is the image link:
>
> http://www.libertas.hu/slike/Spanish%20text.jpg
>
> Here was the output .txt (gibberish):
>
> _ físbřts-q:!:
> J4Lçar'tr.r1S.r:t:
> cærrtraø. CSS.
> 22C} (puerta
> ğrcupietaria)
> Er: |::•ler'•c CC
> guæ- En ve
> (SCI a 7O êj
> Scruõs, Eu'1 1
> la psàtirna 1::
> trate: pcvr ÇE
> elgc czlæ irug
> prirrtære pdă
> era la plants
> erreglscics
> \/Istæ sobre
>
>
> What is the trick to getting correct results?
>
> Thank you.
> Darren

Your image has a lot of grey in it. It would be more helpful to see
the TIFF file you used than the JPG.

I used the Gimp's 'levels' tool to enhance the image, then changed the
image to a 1 bit palette and saved it as a TIFF.
You can see the resulting image here:
http://stuporglue.org/downloads/spanish.tif

The command I used was 'tesseract spanish.tif sp -l spa' and I have
tesseract 2.0.3 installed.

I Habitsclonas da Ann (AnEls
Aparrmsnts; plano en Cøior del
centro, G5, 61): Prijsko. 7. D 321~
220 (peru es ram encontrar u la
propletarxa) y 09B­503·28E (móvil).
En pleno corazón de la ciudad anti-
gua. En verano. de 444 a 518 Kn
(60 a 70 É) la noche para dos per­
Sonas. En una Casa adornada por
la pátina de los siglos. Muy buen
trato por parte de Ana, quien habla
algo de inglés. DOS estudios en IE
primera planta y un apartamento
en la planta baja. impecables y bian
arreglados, con ducha y Cocina.
Vista sobre las Callejas del barrio.

I think if you do some image cleanup before processing them with
tesseract you will get much better results.


-- 
Michael Moore
-------------------------
Share your families' genealogy and family history books. It's easy and
free : http://bookscanned.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to