This is something that I would like to use too.

In my testing so far, I found that the interword space is sometimes
eliminated altogether in case of Hindi (randomly, as far as I can tell).

And, if there are paragraphs that start with indentation, then the
segmentation goofs up and that line does not get recognized correctly.

Maybe there are some config variables that I need to tweak to fix this.

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Mon, Jun 10, 2013 at 6:07 PM, Nick White <[email protected]> wrote:

> Hi Eric,
>
> Thanks for this posting. Out of curiousity why do you need to
> preserve multiple spaces?
>
> Do you think you could update the code to allow a new configuration
> variable? If you did, and posted the patch to the issues page, I
> expect it would be accepted, as this sounds like the sort of thing
> that is useful to be able to do.
>
> Nick
>
> On Sat, Jun 08, 2013 at 01:50:11PM -0700, [email protected] wrote:
> > I found the code Ray referred to back in '09. It is now in
> GetUTF8Text(). In
> > baseapi.cpp in TessBaseAPI::GetUTF8Text I changed:
> >
> >     *ptr++ = ' ';
> >
> > to
> >
> >     {
> >       int i ;
> >       for ( i = 0 ; i < word->word->space() ; i++ )
> >         *ptr++ = ' ';
> >     }
> >
> > This added back in the multiple spaces as advertised. The results are a
> bit
> > unpredictable (as Ray warned back in '09).
> >
> > I'll keep poking at it.
> >
> > Eric
> >
> >
> > On Saturday, June 8, 2013 10:37:20 AM UTC-4, [email protected] wrote:
> >
> >     I need to maintain the (multiple) spaces in my output document.
> About 5
> >     years ago someone asked how to do this and Ray posted a suggestion.
> That
> >     suggestion does not appear to correspond to the current source code.
> >
> >     Can anyone suggest how I can maintain word spacing both before the
> first
> >     word on a line (indentation) as well as between words within a line?
> >
> >     I can force the text in the input image to have fixed spacing.
> >
> >     Ideally, there is a command line switch or a config item that will
> do what
> >     I need, but I am not averse to modifying the code if necessary.
> >
> >     Thanks,
> >     Eric
> >
> >
> > --
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > ---
> > You received this message because you are subscribed to the Google Groups
> > "tesseract-ocr" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email
> > to [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to