Sorry, there seems to be a problem with the fixed pitch detector. I haven't had chance to investigate it completely yet, and I don't remember exactly, but it is possible that the fixed pitch detector doesn't work on modern fixed pitch. On old typewriters, the spaces are the same size as the characters, but on modern (word processed/laser or inkjet printed) fixed pitch, the size of the spaces is variable, and this is probably preventing the fixed pitch detection algorithm form seeing the text as fixed pitch. Font size can be calculated as follows: ROW* row = ...; double pt_size = (row->x_height() + row->ascenders() - row->descenders()) * 72.0 / resolution; Where row can be obtained from a ROW_RES, and reolution is the input resolution of the image. See write_shm_text in output.cpp.
Ray. On Sun, Nov 23, 2008 at 2:04 AM, Lincolin <[EMAIL PROTECTED]> wrote: > > Dear Ray, > > I am waiting an answer from you on my message below please, I need it > urgently: > > ======================================================== > I have tried the WERD::flag(W_DONT_CHOP) on a different documents > that > contains fixed pitch fonts like Courier New or Lucida Console but > this > flag is ALWAYS (0), is there anything else that I need to do to get > it > to work? > Also is there a way to know the font size of the characters/words? > ======================================================== > > Thanks a lot in advance, > Lincolin. > > On Nov 6, 3:36 am, "Ray Smith" <[EMAIL PROTECTED]> wrote: > > Since Tesseract never got used in an HP product, none of these things > were > > ever necessary enough to add.Serif/sans no, but maybe in 3.0Sub/super no. > > You would have to evaluate that yourself from the bounding box > > Underlined no, but it does detect underlines, so there is a chance the > > information can be recovered. > > Strikethrough no, and not much hope either. > > Ray. > > > > > > > > On Wed, Nov 5, 2008 at 5:13 AM, Lincolin <[EMAIL PROTECTED]> wrote: > > > > > Thanks a lot Ray, but is there anyway to get the following properties > > > too? > > > - Serif and Sans-Serif. > > > - Subscript and Superscript. > > > - Underlined. > > > - Strikethrough > > > > > Thanks alot for your reply Ray. > > > Lincolin > > > > > On Nov 5, 8:35 am, "Ray Smith" <[EMAIL PROTECTED]> wrote: > > > > The bold and italic indicators are currently incorrect. In 3.0 this > state > > > > should be fixed. The fixed pitch v proportional indicator is reliable > > > > though. WERD::flag(W_DONT_CHOP) indicates fixed pitch.Ray. > > > > > > On Mon, Nov 3, 2008 at 9:30 AM, Lincolin <[EMAIL PROTECTED]> > wrote: > > > > > > > I have been trying to get the font information from tesseract like > > > > > proportional, serif, sans-serif, bold, italic, ...etc and I have > tried > > > > > to trace the tesseract returned values for some of these properties > > > > > such like Italic, bold and proportional inside WERD_RES structure > but > > > > > I didn't understand what these values means since they are not > > > > > booleans and they have different values (sometime negative ones). > > > > > Does anyone know a way to get these information in order to create > the > > > > > proper font? > > > > > > > Thanks a lot in advanced, > > > > > Lincolin- Hide quoted text - > > > > > > - Show quoted text -- Hide quoted text - > > > > - Show quoted text - > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

