limit output to ASCII charset

2010-05-25 Thread haratron
http://www.linux.com/archive/feed/57222 Also, it can generate output only in the US-ASCII character set, so glyphs with accent marks or other unsupported attributes will probably be reproduced incorrectly. Which is the option to make it limit output to the ASCII charset only? Some letters such as

Re: get region size

2010-05-25 Thread haratron
I searched a lot and found this: tesseract image.tif boxes batch.nochop makebox If I invoke that, i get a boxes.txt file with what appear to be coordinates. But they are too large. I read somewhere that tesseract computes the coordinates from the bottom of the image and not from the top left

Re: limit output to ASCII charset

2010-05-26 Thread haratron
(77yrsold) On Wed, May 26, 2010 at 8:39 AM, nguyenq nguyen...@gmail.com wrote: You can perform some text manipulations in post-processing steps to strip out diacritical marks to leave only the base ASCII characters behind. On May 25, 3:34 pm, haratron harat...@gmail.com wrote: http

OCR forum

2010-08-06 Thread haratron
I'd like to know if there's an OCR forum and/or IRC channel where people can ask/answer OCR related questions. Anyone knows if something like that exists? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to

Re: OCR of Screenshots

2010-09-08 Thread haratron
I'm also interested in this topic. I have a couple of questions: 1. How can I calculate the ideal image size (300dpi?) to feed to tesseract? I mean, how do I identify how much scaling the image needs, before the OCR procedure. 2. I'm currently using ImageMagick's convert program for scaling and

certainty()

2010-10-05 Thread haratron
I'm using tesseract 3.00 with hOCR output and I get the xocr_word among other things. Example: span class='xocr_word' id='xword_1_5' title=x_wconf -4testing/span The x_wconf attribute is for certainty of the result. Which is calculated through a certainty() function, from what I saw in

Re: certainty()

2010-10-05 Thread haratron
Thank you On Wed, Oct 6, 2010 at 3:26 AM, Jimmy O'Regan jore...@gmail.com wrote: On 5 October 2010 23:43, haratron harat...@gmail.com wrote: I'm using tesseract 3.00 with hOCR output and I get the xocr_word among other things. Example: span class='xocr_word' id='xword_1_5' title=x_wconf

Re: makebox alternatives

2010-10-05 Thread haratron
Thank you Jimmy. batch and nobatch are empty and batch.nochop contains: chop_enable 0 wordrec_enable_assoc 0 What do these do? On Wed, Oct 6, 2010 at 3:10 AM, Jimmy O'Regan jore...@gmail.com wrote: On 5 October 2010 23:17, haratron harat...@gmail.com wrote: I'm trying to figure out the way

Re: Bank Card Embossing Characters Recongnition

2012-12-15 Thread haratron
Hello Neo, which SWT implementation did you use? There are several ones out there and I haven't found one that produces your result yet. Thanks On Thu, Dec 13, 2012 at 1:24 PM, Dmitri Silaev daemons2...@gmail.com wrote: Neo Song, There are two usual approaches to problems like yours. The

[tesseract-ocr] OCR forum/mailing list

2016-12-16 Thread haratron
Is there a generic OCR or document image analysis forum or mailing list somewhere? Something that's not limited to tesseract. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

[tesseract-ocr] text line overlay

2016-12-21 Thread haratron
I'm using this snippet to crop an input image into textlines: Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true, false, 0, NULL, NULL, NULL); for (int i = 0; i < boxes->n; i++) { BOX* box = boxaGetBox(boxes, i, L_CLONE); PIX* pixd= pixClipRectangle(image, box, NULL);

Re: [tesseract-ocr] Re: warped text lines

2016-12-21 Thread haratron
No, I want to dewarp the warped lines of a page of a book. The warped lines is due to perspective distortion (picture acquired with the camera of a mobile phone) and curvature of the book. RIL_WORD or RIL_SYMBOL wouldn't help with that. On Thu, Dec 22, 2016 at 6:50 AM, Junmock Lee

[tesseract-ocr] warped text lines

2016-12-21 Thread haratron
Does tesseract provide a way to dewarp warped text lines? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post

[tesseract-ocr] find blocks of text

2017-06-27 Thread haratron
How can I find blocks of text (not paragraphs necessarily) with tesseract? If not possible with tesseract, do you know of any other tool that can do this? I want to do OCR zoning. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe