Rob, Able to view in this email itself. As per your problem= "'Typically in text editors (including Notepad++, UltraEdit, MS Word, Notepad, etc.), an unrecognized character will be displayed as a simple box. This is not readable. So, to verify your results, especially while training, you need to check how accurate the results came out.' Have you succeeded or solved the problem? regards
On Mon, May 4, 2009 at 10:54 PM, 74yrs old <[email protected]> wrote: > ⒺⒻⓁⓁⓊⓅⓈⓈⓉⓊ > U+24BA U+24BB U+24C1 U+24C1 U+24CA U+24C5 U+24C8 U+24C8 U+24C9 U+24CA > U+000A > > E F L L U P S S T U http://rishida.net/scripts/uniview/conversion.php > 000A [control] 0020 SPACE > 0045 E LATIN CAPITAL LETTER E > 0020 SPACE > 0046 F LATIN CAPITAL LETTER F > 0020 SPACE > 004C L LATIN CAPITAL LETTER L > 0020 SPACE > 004C L LATIN CAPITAL LETTER L > 0020 SPACE > 0055 U LATIN CAPITAL LETTER U > 0020 SPACE > 0050 P LATIN CAPITAL LETTER P > 0020 SPACE > 0053 S LATIN CAPITAL LETTER S > 0020 SPACE > 0053 S LATIN CAPITAL LETTER S > 0020 SPACE > 0054 T LATIN CAPITAL LETTER T > 0020 SPACE > 0055 U LATIN CAPITAL LETTER U > 000A [control] > 000A [control] > > > > On Mon, May 4, 2009 at 6:46 PM, Rob H. <[email protected]> wrote: > >> >> Copy and paste the following text into the basic notepad application. >> It will show up as "little boxes". >> There's a good chance that your web browser doesn't have a unicode >> enabled font, so most of the following characters will display as >> garbage. >> >> The following characters are: circled E, circled F, circled L, circled >> L, circled U, circled P, circled S, circled S, circled T, circled U >> >> ⒺⒻⓁⓁⓊⓅⓈⓈⓉⓊ >> >> Or you can copy/paste those into the web app and view them: >> http://rishida.net/scripts/uniview/uniview.php?codepoints=24BA 24BB >> 24C1 24C1 24CA 24C5 24C8 24C8 24C9 24CA >> >> >> On May 3, 5:35 am, 74yrs old <[email protected]> wrote: >> > Thanks. very good idea. will you please upload sample of "little box"? >> > >> > >> > >> > On Sun, May 3, 2009 at 9:21 AM, Rob H. <[email protected]> wrote: >> > >> > > I'm training Tess to recognize letters/numbers/symbols/etc. used for >> > > geometrical tolerancing and annotations (ASME Standard Y14.5) >> > > Alot of the characters used in the ASME standard are coming from all >> > > over the unicode tables (although the characters/words are from the >> > > English language). >> > >> > > This is part of a data validation project and I'm using OCR as part of >> > > the process. >> > > Since OCR is not 100% accurate, some of the validation will need to be >> > > done by hand (hopefully as little as possible). >> > > If the person checking the annotation sees a "little box" (ie >> > > unprintable character) then it will slow down their job. >> > > For the moment, I check unprintable characters using the webapp which >> > > I posted above. >> > > Once this goes into production, there will be a font (purchasd or >> home- >> > > brewed) which can correctly draw all the letters/numbers/symbols/etc. >> > >> > > On May 2, 7:04 am, 74yrs old <[email protected]> wrote: >> > > > Hi Rob, >> > > > I know about conversion.php which I am using for long time for >> Kannada >> > > > project. >> > > > Will you kindly explain by step by step of your experiment with >> sample >> > > if >> > > > any. I >> > > > wanted to have hands on experience. BTW which lang. you were >> training? >> > > > Regards, >> > > > sriranga(76yrs old) >> > >> > > > On Sat, May 2, 2009 at 6:37 AM, Rob H. <[email protected]> wrote: >> > >> > > > > Also, I got this e-mail from a someone named Albert >> > > > > ========= >> > > > > Hi Rob, >> > >> > > > > Reply to your "ps".... >> > >> > > > > That doesn't make any sense to me. You are asking for a set of >> glyphs >> > > > > that can represent every Unicode character in existence. Not >> > > > > only would such a file be *HUGE* in size, but I can't see it as >> > > > > serving any purpose to anyone (other than you, I guess)... >> > >> > > > > So you should stop looking for it. >> > >> > > > > - >> > > > > Albert >> > > > > ========= >> > >> > > > > Arial Unicode covers ~50K of the ~140K characters defined at >> > > > > unicode.org. This font file is 22mb. >> > > > > Wouldn't a complete unicode font be around 70mb? >> > >> > > > > If you need a general text viewer which can legibly show documents >> > > > > that contain any number of the valid ~140K characters, >> > > > > then a complete font would be useful. >> > >> > > > > Great advice Albert...*roll eyes*... "stop looking"... how about >> > > > > something a little more constructive? >> > > > > maybe you know a strategy of mixing fonts to enable an application >> to >> > > > > view all the possible unicode characters?- Hide quoted text - >> > >> > - Show quoted text - >> >> >> > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

