Hi all, I have the following problem with tesseract's output:
I have documents with some address field such as Lastname more information Firstname 01.01.89 Street No D 12345 Town more information I just need the address in 4 lines, without the birthdate. Calling tesseract with shellexec using an uzn zone file everything works fine, but I don't know when tesseract is finished so I have some timing problem on older computers. Now I like to do it using the windows dll. If I use recognize_all_words the result seems to be ok, but has all information, so the address I need is is quite difficult to extract. If I use recognize_a_block the resulting string has only the address information, but not in 4 lines. The result looks this way: LastnameFirstnameStreetNo <nl> D 12345 Town There is only one linefeed after Street No: I think the reason is that there is no additional information in the line. Has anyone any idea what I can do or what I did wrong ? I know the EANYCODE_CHAR structure has the box coordinates of each letter so I can look for new lines myself, but I think there must be an easier way to receive the correct result. Thx in advance and cheers, Chris from Aachen, Germany --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

