The 1st part trims all the tab or space characters at the beginning and end
of each line or of the text.
The 2nd part replaces all the single line-feed (newline) characters with
spaces, effectively joining the lines. So if you want to retain the
paragraph, mark it with 2 or more LF characters.
On Sunday, August 10, 2014 5:26:14 AM UTC-5, Bruce wrote:
>
> It works wonderfully.. can you explain more on the Regex statement? I
> can't understand what the first regex statement is matching against.
>
> Thanks again for sharing your wonderful solution!!
>
> On Saturday, August 9, 2014 9:32:00 AM UTC+8, Quan Nguyen wrote:
>>
>> It employs a proper Regex statement. Following is the function in Java
>> that it uses:
>>
>> /**
>> * Removes line breaks.
>> * @param text
>> * @return
>> */
>> public static String removeLineBreaks(String text) {
>> return text.replaceAll("(?<=\n|^)[\t ]+|[\t ]+(?=$|\n)",
>> "").replaceAll("(?<=.)\n(?=.)", " ");
>> }
>>
>>
>>
>> On Friday, August 8, 2014 10:49:27 AM UTC-5, Bruce wrote:
>>>
>>> Aww.. I tried removing the '\n' line breaks manually, however for some
>>> articles, the paragraph break still consists of single '\n' line break so
>>> if I remove that too doing a find/replace I loses the paragraph break. How
>>> did VietOCR solve this issue?
>>>
>>> On Thursday, August 7, 2014 7:23:27 AM UTC+8, Quan Nguyen wrote:
>>>>
>>>> I'm afraid not. You can use any programming editor that supports Regex
>>>> find/replace to do it for you, or use a tool such as VietOCR
>>>> <http://vietocr.sf.net> to remove line breaks from the output text.
>>>>
>>>> On Wednesday, August 6, 2014 10:51:34 AM UTC-5, Bruce wrote:
>>>>>
>>>>> For example with the image attached, I get the output:
>>>>>
>>>>> - Chapter One
>>>>> -
>>>>> - A royal-red Ford F—150 Super-
>>>>> - Crew rolled through the streets
>>>>> - of Albany, Georgia. The pickup’s
>>>>> - driver brimmed with optimism, so
>>>>> - much that he couldn’t possibly
>>>>> - foresee the battles about to hit
>>>>> - his hometown.
>>>>> -
>>>>> - Life here is going to be good,
>>>>> - thirty—seven—year—old Nathan
>>>>> - Hayes told himself. After eight
>>>>> - years in Atlanta, Nathan had
>>>>> - come home to Albany, three
>>>>> - hours south, with his wife and
>>>>>
>>>>> Is there a way to make the output as the below, without the line
>>>>> breaks within a paragraph?
>>>>>
>>>>> - Chapter One
>>>>> -
>>>>> - A royal-red Ford F—150 Super-Crew rolled through the streets of
>>>>> Albany, Georgia. The pickup’s driver brimmed with optimism, so much
>>>>> that he
>>>>> couldn’t possibly foresee the battles about to hit his hometown.
>>>>> -
>>>>> - Life here is going to be good, thirty—seven—year—old Nathan
>>>>> Hayes told himself. After eight years in Atlanta, Nathan had come home
>>>>> to
>>>>> Albany, three hours south, with his wife and
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/d251dbe2-c42f-4a00-a426-2b7ac2be8984%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.