Aww.. I tried removing the '\n' line breaks manually, however for some articles, the paragraph break still consists of single '\n' line break so if I remove that too doing a find/replace I loses the paragraph break. How did VietOCR solve this issue?
On Thursday, August 7, 2014 7:23:27 AM UTC+8, Quan Nguyen wrote: > > I'm afraid not. You can use any programming editor that supports Regex > find/replace to do it for you, or use a tool such as VietOCR > <http://vietocr.sf.net> to remove line breaks from the output text. > > On Wednesday, August 6, 2014 10:51:34 AM UTC-5, Bruce wrote: >> >> For example with the image attached, I get the output: >> >> - Chapter One >> - >> - A royal-red Ford F—150 Super- >> - Crew rolled through the streets >> - of Albany, Georgia. The pickup’s >> - driver brimmed with optimism, so >> - much that he couldn’t possibly >> - foresee the battles about to hit >> - his hometown. >> - >> - Life here is going to be good, >> - thirty—seven—year—old Nathan >> - Hayes told himself. After eight >> - years in Atlanta, Nathan had >> - come home to Albany, three >> - hours south, with his wife and >> >> Is there a way to make the output as the below, without the line breaks >> within a paragraph? >> >> - Chapter One >> - >> - A royal-red Ford F—150 Super-Crew rolled through the streets of >> Albany, Georgia. The pickup’s driver brimmed with optimism, so much that >> he >> couldn’t possibly foresee the battles about to hit his hometown. >> - >> - Life here is going to be good, thirty—seven—year—old Nathan Hayes >> told himself. After eight years in Atlanta, Nathan had come home to >> Albany, >> three hours south, with his wife and >> >> Thanks in advance! >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4d961c36-4084-485f-a071-45cda07c7781%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

