I'm having similar issues and hope you get an answer to this question. Thomas L. Packer ~~~~~~~~~~~~~~~~~~~~
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of groupmeister Sent: Tuesday, March 09, 2010 9:30 PM To: ocropus Subject: introduction and request for help getting up and running Howdy, I posted a message a few days earlier but haven't seen it show up, asking for some help getting up and running. I notice now that there is a note about posting an introduction of yourself first before posting to the group. Maybe that's why it hasn't shown up. Sorry I didnt notice that before, I've been using usenet for 21 years and am not used to things working that way. Moderated groups have existed forever, but I hadn't come across one that requires a personal introduction first. I'm happy to do that though. I'm a physician, and used to be a computer programmer. I have a medical services software company that may have a use for OCR in medical records management. We are interested in OCRopus. I have a small development team I lead, I want to try out OCRopus myself personally and see what it's capable of. If it looks good enough to help us, I will start some projects on our end to make use of it, which would almost certainly lead to code contributions from us. We have made significant contributions to a number of other open source projects already in the course of our work. The developer I was going to assign this to looked into things and he said he thought the project was in an unstable alpha phase and was not likely to be useful to us. Looking through things myself, I don't think that's true. Also I see that the next release (0.5) will be easier to integrate into other software products. Anyway, i want to see for myself if this can be helpful to us or not. At any rate, I have been trying to do a proof of concept install of OCRopus to evaluate what it can do, and I can't get it to produce non- garbage output. I believe I have followed the step-by-step install instructions for ubuntu properly. I'm wondering if there's something I'm missing - do I need to train the system for a while before using it? It seems to come with pre-formed models. Are there settings I can tweak for processor speed, etc? Maybe the fact that I'm running it in a VM affects things. If anyone could help me get up and running, I'd appreciate it. The full details of my installation/get up and running problem is pasted below, from my first post attempt. Thanks very much- Jack Howdy.... I am wondering if anyone can help me get up and running on ubuntu 9.1. I have a company that has some customized IT, we run some custom-made software packages for my company on our servers, I'm wondering if there's a chance this OCR package could be useful to us, if so I may ask our IT people to look at integrating it into some tools we use. There is definitely potential for us to contribute back to the project if we wind up using it. I myself used to be a hobbyist computer programmer until around 1995, after which I didnt have any time for it. Anyway, sorry for all the irrelevant babble. I have a VMware "virtual applicance" image of ubuntu 9.1 desktop version which I downloaded from the operating systems section of the virtual appliance marketplace section of VMware's website. I installed ocropus on that virtual Ubuntu applicance which appeared to go without a hitch, though frankly there are one or two errors which appeared to be non-fatal and which I shall profusely apologize, I didn't copy down. At any rate, I have ocropus fully up and running, but it always comes up with garbage whatever I throw at it, whether with the one-line page command or the book2lines,etc., sequence of commands which is recommended. My questions: (1) Does it come out-of-the-box trained on a recognition model, or is it necessary to train it on a model to get it to work? (2) When I test it out-of-the-box on a sample document that contains what I think is easily recognizable test, I get this: d...@ubuntu:~/ocropus$ ocropus page test.png [info] got 895 bboxes [info] all = 0 -~-- -. . c|T 5G [error] beam search failed r2#,:-Y//>Ni| o : %d : 6!5## ## #a Bk|v| %25` ?Yl,?67o7 ao`|2/#|/e 9"q . "|`'+,oo<1|n|c %;: .e;::|:rt: |--- e(< |`42 ?6: 5a: tt+c:~ EMn-- //. ~# ;#; ..%::;:::- "-"##i o,OB< y2~/<X#<, #|e.#--| [error] beam search failed #;;;. .;:o::: ew7.$;:L#]i:I,r5t{- ..s7AT -- o3/||/|q ot24.. [error] beam search failed ## a: |;;A,%|2:2:# t Tn%=c~ g;V;#U%-|r:#Y6:" ## # ;aa2<5J"| ::Li%Y:Ic < f? ;;z%;;a-. #`-""9## ` ` @`|h "boo| | RR| su| ##V-.-. ;Jg l e|i ' ` ,"d "||V,,#| |,| ,,t / Z ~ J 4#/ s #g;a :+. c |osr Acct Ac7 ` 1a:#/ ! .Z -2 # / l ~ lll]yl l!l[g# l [error] beam search failed .---- #9" "66.` ` eer|#arent a-aa|ohe # i 5`- | c7 R|An wi7Rout covro #j ]l;l+l#l[Ill:l [error] beam search failed [error] beam search failed or. ="n-- '#4=7#%:G;|;7-f:|::b `o|< #~#), >/<, "##,. #-,'~ [error] beam search failed ;<.';:L:#i:|-r5t|- [error] beam search failed g;#<;U|%; :uc::-# e ge. ~ [error] beam search failed ,c-.- . . -~-. ---. 9. ..- "` .=9&` ##o$o#>| vad...@ubuntu:~/ocropus$ (2) When I execute the command to check out the default model, I see: vad...@ubuntu:~/ocropus$ ocropus cin model' Linerec linerec_verbose=0 linerec_grouper=SimpleGrouper linerec_use_reject=1 linerec_use_priors=0 linerec_invert=1 linerec_space_fractile=0.5 linerec_space_min=0.2 linerec_minheight=10 linerec_maxheight=300 linerec_space_max=1.1 linerec_space_yes=1 linerec_maxaspect=1 linerec_segmenter=DpSegmenter linerec_classifier=latin linerec_space_multiplier=2 linerec_extractor=scaledfe linerec_cpreload=none linerec_space_no=5 linerec_minclass=32 linerec_maxcost=20 linerec_maxrange=5 linerec_minprob=1e-06 segmenter: curved cut segmenter grouper: SimpleGrouper counts: 126 2309208 CHARCLASS MODEL MLP mlp_normalization=-1 mlp_hidden_hi=80 mlp_noopt=0 mlp_hidden_lo=20 mlp_cv_max=5000 mlp_cds=rowdataset8 mlp_eta=0.5 mlp_miters=8 mlp_hidden_varlog=1.2 mlp_sparse=-1 mlp_hidden_min=5 mlp_hidden_max=300 mlp_rounds=8 mlp_nensemble=4 mlp_%error=0.0267507 mlp_eta_varlog=1.5 mlp_eta_init=0.5 mlp_crossvalidate=1 mlp_extractor=none mlp_cv_split=0.8 mlp_%nsamples=2.30921e+06 ninput 900 nhidden 90 noutput 93 w1 [-15.0474,25.0991] b1 [-24.2937, w2 [-24.3254,14.7334] b2 [-5.92876, JUNKCLASS MODEL MLP mlp_normalization=-1 mlp_hidden_hi=80 mlp_noopt=0 mlp_hidden_lo=20 mlp_cv_max=5000 mlp_cds=rowdataset8 mlp_eta=0.5 mlp_miters=8 mlp_hidden_varlog=1.2 mlp_sparse=-1 mlp_hidden_min=5 mlp_hidden_max=300 mlp_rounds=8 mlp_nensemble=4 mlp_%error=0.0232608 mlp_eta_varlog=1.5 mlp_eta_init=0.5 mlp_crossvalidate=1 mlp_extractor=none mlp_cv_split=0.8 mlp_%nsamples=5.27385e+06 ninput 900 nhidden 103 noutput 2 w1 [-34.5243,30.0879] b1 [-29.1538, w2 [-12.044,12.044] b2 [-0.0241522, ULCLASS MODEL MLP mlp_normalization=-1 mlp_hidden_hi=80 mlp_noopt=0 mlp_hidden_lo=20 mlp_cv_max=5000 mlp_cds=rowdataset8 mlp_eta=0.5 mlp_miters=8 mlp_hidden_varlog=1.2 mlp_sparse=-1 mlp_hidden_min=5 mlp_hidden_max=300 mlp_rounds=8 mlp_nensemble=4 mlp_eta_varlog=1.5 mlp_eta_init=0.5 mlp_crossvalidate=1 mlp_extractor=none mlp_cv_split=0.8 ninput 0 nhidden 0 noutput 0 vad...@ubuntu:~/ocropus$ Question: Am I doing something wrong? I dont understand why the results of ocropus page are so garbagey. I see a few places where there are 3 characters in a row I suspect may have been correctly identified. Should I be able to run it right out of the box, following the install instructions, and running it using the page option immediately? The VMware appliance is running on a 2.4 ghz core 2 duo mac running os 10.6.2, vmare v 3.02 -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
