Dear Daniel, Is there not an easyer way to do this, because I use GUI when I work and this is my problem:
I'm trying to train Tesseract for Kurdish, this is good too for the Persian, Kurdish has some more other letters, but the way of writing is the same as Arabic or Farsi. The problem I'm getting is that the final OCR result is not from right to left, but from left to right, which means that u can't read the text, but the letters r correct. I use qt-box-editor to edit the box, then I use Serak tesseract Trainer V0.4 to train the OCR, after all I put the Traineddata file in the Tesseract dir., every thing goes well except the missing Arabic mechanism of writing from right to left. So is there any way to change that unicharset file with a GUI i.s.o. the command line? Thanks alot in advanced Karo Op maandag 15 juli 2013 01:02:59 UTC+2 schreef Daniel: > > Thanks WHITE N. & sdk. > > Both of you helped me so much! thank you! > > For anybody else that looking for solution to this problem (with non > correct unicharset file generated by unicharset_extractor) > I also port the python script to correct the unicharset file to php, so if > anyone need such code, you can send me email and I will send you the code. > > > On Sunday, July 7, 2013 12:45:18 PM UTC+3, Daniel wrote: >> >> Hi everyone, >> >> I worked on a project that I need to do training for rtl languages. >> (hebrew and arabic) >> After I do the training process everything works great, except that the >> text printed as ltr text. >> Is there any flag to set during the training process that tell tesseract >> to treat the trained file as rtl language file so he can print the text in >> the right order? >> >> Thanks for helping! >> Daniel >> > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

