Dear Tom, from my practical experience, it is observed that ocropus cannot train big number of image files due to memory error problem. Instead better to train smaller number(not size) of image files to overcome problem memory error if any. In this context, I suggest to train the folder1, folder2, folder 3 etc. each containing small number of image file like tif or png of Lang. - will generate the following data files:
1. boxdata.h5 2. boxdata.cmodel 3. boxdata.split 4. page.bin.png 5. page.pseg.png 6. page. 7. book-xxxx If the script ./run-box-training is run for each folder1, folder2, folder3 will generate 7 datafiles with reference to each folder separately. My suggestion is whether is it possible to copy and paste the contents of 7data files so generated * into Main 7 data files created*? Thus Main 7 data files will contain the extract of all date files generated by each folder 1, folder 2, and folder 3 etc. Awaiting your comments. with warmest regards, -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
