On Thu, Jan 8, 2009 at 10:07 AM, Darren Govoni <[email protected]> wrote: > > Hi, > I read tesseract supports a variety of languages. I convert a Spanish > text JPG to TIFF and ran tesseract with the spanish language pack and > the output text was not even close. Here is the image link: > > http://www.libertas.hu/slike/Spanish%20text.jpg > > Here was the output .txt (gibberish): > > _ físbřts-q:!: > J4Lçar'tr.r1S.r:t: > cærrtraø. CSS. > 22C} (puerta > ğrcupietaria) > Er: |::•ler'•c CC > guæ- En ve > (SCI a 7O êj > Scruõs, Eu'1 1 > la psàtirna 1:: > trate: pcvr ÇE > elgc czlæ irug > prirrtære pdă > era la plants > erreglscics > \/Istæ sobre > > > What is the trick to getting correct results? > > Thank you. > Darren
Your image has a lot of grey in it. It would be more helpful to see the TIFF file you used than the JPG. I used the Gimp's 'levels' tool to enhance the image, then changed the image to a 1 bit palette and saved it as a TIFF. You can see the resulting image here: http://stuporglue.org/downloads/spanish.tif The command I used was 'tesseract spanish.tif sp -l spa' and I have tesseract 2.0.3 installed. I Habitsclonas da Ann (AnEls Aparrmsnts; plano en Cøior del centro, G5, 61): Prijsko. 7. D 321~ 220 (peru es ram encontrar u la propletarxa) y 09B503·28E (móvil). En pleno corazón de la ciudad anti- gua. En verano. de 444 a 518 Kn (60 a 70 É) la noche para dos per Sonas. En una Casa adornada por la pátina de los siglos. Muy buen trato por parte de Ana, quien habla algo de inglés. DOS estudios en IE primera planta y un apartamento en la planta baja. impecables y bian arreglados, con ducha y Cocina. Vista sobre las Callejas del barrio. I think if you do some image cleanup before processing them with tesseract you will get much better results. -- Michael Moore ------------------------- Share your families' genealogy and family history books. It's easy and free : http://bookscanned.com --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

