The 1st part trims all the tab or space characters at the beginning and end 
of each line or of the text.

The 2nd part removes all the single line-feed (newline) characters. So if 
you want to retain the paragraph, mark it with 2 or more LF characters.

On Sunday, August 10, 2014 5:26:14 AM UTC-5, Bruce wrote:
>
> It works wonderfully.. can you explain more on the Regex statement? I 
> can't understand what the first regex statement is matching against. 
>
> Thanks again for sharing your wonderful solution!!
>
> On Saturday, August 9, 2014 9:32:00 AM UTC+8, Quan Nguyen wrote:
>>
>> It employs a proper Regex statement. Following is the function in Java 
>> that it uses:
>>
>>     /**
>>      * Removes line breaks.
>>      * @param text
>>      * @return
>>      */
>>     public static String removeLineBreaks(String text) {
>>         return text.replaceAll("(?<=\n|^)[\t ]+|[\t ]+(?=$|\n)", 
>> "").replaceAll("(?<=.)\n(?=.)", " ");
>>     }
>>
>>
>>
>> On Friday, August 8, 2014 10:49:27 AM UTC-5, Bruce wrote:
>>>
>>> Aww.. I tried removing the '\n' line breaks manually, however for some 
>>> articles, the paragraph break still consists of single '\n' line break so 
>>> if I remove that too doing a find/replace I loses the paragraph break. How 
>>> did VietOCR solve this issue?
>>>
>>> On Thursday, August 7, 2014 7:23:27 AM UTC+8, Quan Nguyen wrote:
>>>>
>>>> I'm afraid not. You can use any programming editor that supports Regex 
>>>> find/replace to do it for you, or use a tool such as VietOCR 
>>>> <http://vietocr.sf.net> to remove line breaks from the output text.
>>>>
>>>> On Wednesday, August 6, 2014 10:51:34 AM UTC-5, Bruce wrote:
>>>>>
>>>>> For example with the image attached, I get the output:
>>>>>
>>>>>    - Chapter One
>>>>>    - 
>>>>>    - A royal-red Ford F—150 Super-
>>>>>    - Crew rolled through the streets
>>>>>    - of Albany, Georgia. The pickup’s
>>>>>    - driver brimmed with optimism, so
>>>>>    - much that he couldn’t possibly
>>>>>    - foresee the battles about to hit
>>>>>    - his hometown.
>>>>>    - 
>>>>>    - Life here is going to be good,
>>>>>    - thirty—seven—year—old Nathan
>>>>>    - Hayes told himself. After eight
>>>>>    - years in Atlanta, Nathan had
>>>>>    - come home to Albany, three
>>>>>    - hours south, with his wife and
>>>>>
>>>>> Is there a way to make the output as the below, without the line 
>>>>> breaks within a paragraph?
>>>>>
>>>>>    - Chapter One
>>>>>    - 
>>>>>    - A royal-red Ford F—150 Super-Crew rolled through the streets of 
>>>>>    Albany, Georgia. The pickup’s driver brimmed with optimism, so much 
>>>>> that he 
>>>>>    couldn’t possibly foresee the battles about to hit his hometown.
>>>>>    - 
>>>>>    - Life here is going to be good, thirty—seven—year—old Nathan 
>>>>>    Hayes told himself. After eight years in Atlanta, Nathan had come home 
>>>>> to 
>>>>>    Albany, three hours south, with his wife and
>>>>>    
>>>>> Thanks in advance!
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b3e9d6ad-f6de-4d71-9dea-832059d497c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to