It employs a proper Regex statement. Following is the function in Java that 
it uses:

    /**
     * Removes line breaks.
     * @param text
     * @return
     */
    public static String removeLineBreaks(String text) {
        return text.replaceAll("(?<=\n|^)[\t ]+|[\t ]+(?=$|\n)", 
"").replaceAll("(?<=.)\n(?=.)", " ");
    }



On Friday, August 8, 2014 10:49:27 AM UTC-5, Bruce wrote:
>
> Aww.. I tried removing the '\n' line breaks manually, however for some 
> articles, the paragraph break still consists of single '\n' line break so 
> if I remove that too doing a find/replace I loses the paragraph break. How 
> did VietOCR solve this issue?
>
> On Thursday, August 7, 2014 7:23:27 AM UTC+8, Quan Nguyen wrote:
>>
>> I'm afraid not. You can use any programming editor that supports Regex 
>> find/replace to do it for you, or use a tool such as VietOCR 
>> <http://vietocr.sf.net> to remove line breaks from the output text.
>>
>> On Wednesday, August 6, 2014 10:51:34 AM UTC-5, Bruce wrote:
>>>
>>> For example with the image attached, I get the output:
>>>
>>>    - Chapter One
>>>    - 
>>>    - A royal-red Ford F—150 Super-
>>>    - Crew rolled through the streets
>>>    - of Albany, Georgia. The pickup’s
>>>    - driver brimmed with optimism, so
>>>    - much that he couldn’t possibly
>>>    - foresee the battles about to hit
>>>    - his hometown.
>>>    - 
>>>    - Life here is going to be good,
>>>    - thirty—seven—year—old Nathan
>>>    - Hayes told himself. After eight
>>>    - years in Atlanta, Nathan had
>>>    - come home to Albany, three
>>>    - hours south, with his wife and
>>>
>>> Is there a way to make the output as the below, without the line breaks 
>>> within a paragraph?
>>>
>>>    - Chapter One
>>>    - 
>>>    - A royal-red Ford F—150 Super-Crew rolled through the streets of 
>>>    Albany, Georgia. The pickup’s driver brimmed with optimism, so much that 
>>> he 
>>>    couldn’t possibly foresee the battles about to hit his hometown.
>>>    - 
>>>    - Life here is going to be good, thirty—seven—year—old Nathan Hayes 
>>>    told himself. After eight years in Atlanta, Nathan had come home to 
>>> Albany, 
>>>    three hours south, with his wife and
>>>    
>>> Thanks in advance!
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/55cf785f-a52b-4b3a-95f4-424c8f40247b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to