Thank you very much Tom for the information. I'll take a look into the 
simulation tool.

Happy New Year,
Shibamouli



On Monday, January 5, 2015 12:20:42 PM UTC-5, Tom wrote:
>
> Actually, it takes surprisingly little data: after a few thousand lines of 
> text, you already get pretty readable results for Latin text.  
>
> You can train on simulated data as well with good results: a tool for 
> generating training data artificially is included (but probably requires a 
> bit of adaptation for other scripts).
>
> Tom
>
> On Tuesday, December 23, 2014 6:40:17 PM UTC-8, Shibamouli Lahiri wrote:
>>
>> Hi Tom,
>>
>> Thanks much for the update. I'm new to Ocropus, and I had a question on 
>> running rtrain.
>>
>> Do you know (or have an estimate of) how many lines of text does the 
>> program take (to train) before it starts giving reasonable results? I'm 
>> wondering because since it's neural network based, I'd hazard a guess that 
>> it'd take more than a few thousand lines?
>>
>> More details:  I'm working on gathering labeled data for Bengali (Bangla) 
>> OCR, and needed an estimate of lines that I'll need to transcribe as a 
>> starter.
>>
>> Regards,
>> Shibamouli
>>
>>
>>
>> On Wednesday, December 17, 2014 2:40:11 PM UTC-5, Tom wrote:
>>>
>>> With the new recognizer, it should be pretty easy to train. We've 
>>> trained it for other scripts purely from generated data and gotten pretty 
>>> good results.
>>>
>>> I'll try to create some more documentation and some simpler training 
>>> scripts.
>>>
>>> Tom
>>>
>>> On Wednesday, December 17, 2014 5:36:34 AM UTC-8, 81+ yrsold wrote:
>>>>
>>>> Tom,
>>>> I am really happy - you have resumed ocropus project again. Trust this 
>>>> time I hope Ocropus Project will support for Indic lang(Indian languages) 
>>>> this time.
>>>> With warmest regards,
>>>> sriranga(81+yrs) 
>>>>
>>>> On Wednesday, December 17, 2014 3:56:52 AM UTC+5:30, Tom wrote:
>>>>>
>>>>> I joined Google this year. Google permits me to spend time on the 
>>>>> OCRopus project and contribute. As part of this, I moved the project to 
>>>>> Github, because it's easier to maintain there.
>>>>>
>>>>> I just pushed out a new update of ocropy. This includes mainly 
>>>>> faster/smaller saving of models, as well as a C++ implementation of the 
>>>>> LSTM network. The C++ LSTM implementation is a pretty straightforward 
>>>>> port 
>>>>> of the Python version and runs much faster. The C++ classes have been 
>>>>> wrapped as Python classes and are callable from Python. There are two new 
>>>>> top-level drivers, ocropus-ltrain and ocropus-lpred, for the C++ 
>>>>> implementation. The C++ implementation appears to be numerically close to 
>>>>> the Python implementation and yield good recognizers when trained, but it 
>>>>> requires more testing.
>>>>>
>>>>> As before, this is research-level software with minimal documentation 
>>>>> (do look at the iPython Notebooks, the .ipynb files, since they contain 
>>>>> significant information). Feel free to contribute patches, documentation, 
>>>>> etc. using the usual Github mechanisms of merge requests. I'll try to 
>>>>> incorporate them as time permits.
>>>>>
>>>>> Tom
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ocropus/83158a47-8027-4d09-b733-70429b906808%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to