You were talking about Canny, Sobel, etc. and these indeed relate to edge detection in its common sense (http://en.wikipedia.org/wiki/Edge_detection). And in this sense Tesseract does not do any edge detection.
Yes, one might call the process of finding CC contours in binary image edge detection. But seeing it as conventional edge detection would make the task completely degenerate and thus using the approaches from conventional edge detection would be totally unreasonable, and some of them - unusable at all. Why there's a comment in the source code saying it's an "edge detector" although this notion has other common meaning? That should be addressed to developers. I suppose this is because internally they refereed to CC contours as "edges" and they used to call their method of contour extraction as "crack edges". I would refrain from considering myself an authority in all that's related to naming and notions, though. What you have shown in your image is not what is produced by extract_edges() or block_edges(). Those build completely different structures, similar to that is commonly known as crack coded CC boundaries. Warm regards, Dmitri Silaev www.CustomOCR.com On Saturday, June 23, 2012 2:47:04 PM UTC+4, shahin youssefi wrote: > > Dmitri, you are correct, this function only set the bounding box of ,em, > not exactly CCs. > if the character has a closed curve in it, the inner area is returned as > an outline. for example [this] <http://i49.tinypic.com/dh35sx.png>. > I've shown the result of the "extract_edges" in green lines. > > On Saturday, June 23, 2012 1:40:48 PM UTC+4:30, Dmitri Silaev wrote: >> >> block_edges() has nothing to do with edge detection. Tesseract does >> not use it at all. It first binarizes entire images then extracts >> connected components (CCs). block_edges() is called to extract CCs' >> outlines from a binarized image. >> >> Warm regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> On Sat, Jun 23, 2012 at 10:43 AM, shahin youssefi >> <[email protected]> wrote: >> > Hello dear friends, >> > I wonder if anybody knows what edge detection algorithm does >> > tesseract 3.01 utilize when finding connected components? >> > More specifically in edgblob.cpp file there is a function called >> > "extract_edges" in which a function named "block_edges" is called >> > which is responsible to extract edges and find the outline of a block. >> > Correct me if I'm wrong but it seems that "block_edges" doesn't use >> > famous edge-detection methods like Canny, Sobel or Prewitt. >> > Thanks in advance. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> > http://groups.google.com/group/tesseract-ocr?hl=en >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

