Aww.. I tried removing the '\n' line breaks manually, however for some 
articles, the paragraph break still consists of single '\n' line break so 
if I remove that too doing a find/replace I loses the paragraph break. How 
did VietOCR solve this issue?

On Thursday, August 7, 2014 7:23:27 AM UTC+8, Quan Nguyen wrote:
>
> I'm afraid not. You can use any programming editor that supports Regex 
> find/replace to do it for you, or use a tool such as VietOCR 
> <http://vietocr.sf.net> to remove line breaks from the output text.
>
> On Wednesday, August 6, 2014 10:51:34 AM UTC-5, Bruce wrote:
>>
>> For example with the image attached, I get the output:
>>
>>    - Chapter One
>>    - 
>>    - A royal-red Ford F—150 Super-
>>    - Crew rolled through the streets
>>    - of Albany, Georgia. The pickup’s
>>    - driver brimmed with optimism, so
>>    - much that he couldn’t possibly
>>    - foresee the battles about to hit
>>    - his hometown.
>>    - 
>>    - Life here is going to be good,
>>    - thirty—seven—year—old Nathan
>>    - Hayes told himself. After eight
>>    - years in Atlanta, Nathan had
>>    - come home to Albany, three
>>    - hours south, with his wife and
>>
>> Is there a way to make the output as the below, without the line breaks 
>> within a paragraph?
>>
>>    - Chapter One
>>    - 
>>    - A royal-red Ford F—150 Super-Crew rolled through the streets of 
>>    Albany, Georgia. The pickup’s driver brimmed with optimism, so much that 
>> he 
>>    couldn’t possibly foresee the battles about to hit his hometown.
>>    - 
>>    - Life here is going to be good, thirty—seven—year—old Nathan Hayes 
>>    told himself. After eight years in Atlanta, Nathan had come home to 
>> Albany, 
>>    three hours south, with his wife and
>>    
>> Thanks in advance!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4d961c36-4084-485f-a071-45cda07c7781%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to