Thank you very much Tom for the information. I'll take a look into the simulation tool.
Happy New Year, Shibamouli On Monday, January 5, 2015 12:20:42 PM UTC-5, Tom wrote: > > Actually, it takes surprisingly little data: after a few thousand lines of > text, you already get pretty readable results for Latin text. > > You can train on simulated data as well with good results: a tool for > generating training data artificially is included (but probably requires a > bit of adaptation for other scripts). > > Tom > > On Tuesday, December 23, 2014 6:40:17 PM UTC-8, Shibamouli Lahiri wrote: >> >> Hi Tom, >> >> Thanks much for the update. I'm new to Ocropus, and I had a question on >> running rtrain. >> >> Do you know (or have an estimate of) how many lines of text does the >> program take (to train) before it starts giving reasonable results? I'm >> wondering because since it's neural network based, I'd hazard a guess that >> it'd take more than a few thousand lines? >> >> More details: I'm working on gathering labeled data for Bengali (Bangla) >> OCR, and needed an estimate of lines that I'll need to transcribe as a >> starter. >> >> Regards, >> Shibamouli >> >> >> >> On Wednesday, December 17, 2014 2:40:11 PM UTC-5, Tom wrote: >>> >>> With the new recognizer, it should be pretty easy to train. We've >>> trained it for other scripts purely from generated data and gotten pretty >>> good results. >>> >>> I'll try to create some more documentation and some simpler training >>> scripts. >>> >>> Tom >>> >>> On Wednesday, December 17, 2014 5:36:34 AM UTC-8, 81+ yrsold wrote: >>>> >>>> Tom, >>>> I am really happy - you have resumed ocropus project again. Trust this >>>> time I hope Ocropus Project will support for Indic lang(Indian languages) >>>> this time. >>>> With warmest regards, >>>> sriranga(81+yrs) >>>> >>>> On Wednesday, December 17, 2014 3:56:52 AM UTC+5:30, Tom wrote: >>>>> >>>>> I joined Google this year. Google permits me to spend time on the >>>>> OCRopus project and contribute. As part of this, I moved the project to >>>>> Github, because it's easier to maintain there. >>>>> >>>>> I just pushed out a new update of ocropy. This includes mainly >>>>> faster/smaller saving of models, as well as a C++ implementation of the >>>>> LSTM network. The C++ LSTM implementation is a pretty straightforward >>>>> port >>>>> of the Python version and runs much faster. The C++ classes have been >>>>> wrapped as Python classes and are callable from Python. There are two new >>>>> top-level drivers, ocropus-ltrain and ocropus-lpred, for the C++ >>>>> implementation. The C++ implementation appears to be numerically close to >>>>> the Python implementation and yield good recognizers when trained, but it >>>>> requires more testing. >>>>> >>>>> As before, this is research-level software with minimal documentation >>>>> (do look at the iPython Notebooks, the .ipynb files, since they contain >>>>> significant information). Feel free to contribute patches, documentation, >>>>> etc. using the usual Github mechanisms of merge requests. I'll try to >>>>> incorporate them as time permits. >>>>> >>>>> Tom >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "ocropus" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ocropus/83158a47-8027-4d09-b733-70429b906808%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
