+ tesseract-dev google group

Thank you, Marco. I will download the training tools packages and and give
it a try.

In future updates to the tesseract package, may I suggest packaging of more
languages from 'tessdata' - https://github.com/tesseract-ocr/tessdata

specially the ones which have multiple files for the language such as ara,
hin etc.

The languages that have just one file for traineddata can be downloaded
easily as a zip from the 'raw' link. It would be very helpful to have a
single tar/zip for the others.

Thanks so much for packaging 3.04.00 for cygwin.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Aug 2, 2015 at 12:53 PM, Marco Atzeri <[email protected]>
wrote:

> On 7/29/2015 11:40 AM, ShreeDevi Kumar wrote:
>
>> ​Marco,
>>
>> Thanks for building the training tools for cygwin. Till now just the
>> additional binaries have been shipped as part of the tesseract package.
>>
>> With Tesseract 3.04.00 t​here are additional scripts provided to help
>> with training. Google has also provided the language data which can be
>> used for training different languages and building the traineddata
>> files. Hence my request to include these.
>>
>> Not all users will be interested in training for a new language or
>> trying to improve an existing traineddata, so in my opinion, it maybe
>> better to package these separately.
>>
>
> Hi ShreeDevi
> uploading 3.04.00-2.
>
> The training tools are in the new package
>   tesseract-training-util
>
> while the training language file are split between
>   tesseract-training-core
>   tesseract-training-{lang}
>
> I have not changed the previos datastructure,
> just added an additional level
>   /usr/share/tessdata/training
>
> and the two test files are in
>   /usr/share/tessdata/testing/eurotext.tif
>   /usr/share/tessdata/testing/phototest.tif
>
>
> $ cygcheck -l tesseract-training-util
> /usr/bin/ambiguous_words.exe
> /usr/bin/classifier_tester.exe
> /usr/bin/cntraining.exe
> /usr/bin/combine_tessdata.exe
> /usr/bin/dawg2wordlist.exe
> /usr/bin/mftraining.exe
> /usr/bin/set_unicharset_properties.exe
> /usr/bin/shapeclustering.exe
> /usr/bin/text2image.exe
> /usr/bin/unicharset_extractor.exe
> /usr/bin/wordlist2dawg.exe
> /usr/bin/language-specific.sh
> /usr/bin/tesstrain.sh
> /usr/bin/tesstrain_utils.sh
>
> $ cygcheck -l tesseract-training-core
> /usr/share/tessdata/training/Arabic.unicharset
> /usr/share/tessdata/training/Arabic.xheights
> ...
> /usr/share/tessdata/training/Cherokee.xheights
> /usr/share/tessdata/training/common.punc
> /usr/share/tessdata/training/common.unicharambigs
> /usr/share/tessdata/training/Common.unicharset
> /usr/share/tessdata/training/Cyrillic.unicharset
> ...
> /usr/share/tessdata/training/Ethiopic.xheights
> /usr/share/tessdata/training/font_properties
> /usr/share/tessdata/training/forbidden_characters_default
> /usr/share/tessdata/training/Georgian.unicharset
> ...
> /usr/share/tessdata/training/Tibetan.unicharset
>
> $ cygcheck -l tesseract-training-eng
> /usr/share/tessdata/training/eng/desired_characters
> /usr/share/tessdata/training/eng/eng.cube-unicharset
> /usr/share/tessdata/training/eng/eng.cube-word-dawg
> /usr/share/tessdata/training/eng/eng.numbers
> /usr/share/tessdata/training/eng/eng.punc
> /usr/share/tessdata/training/eng/eng.training_text
> /usr/share/tessdata/training/eng/eng.training_text.bigram_freqs
> /usr/share/tessdata/training/eng/eng.training_text.unigram_freqs
> /usr/share/tessdata/training/eng/eng.unicharambigs
> /usr/share/tessdata/training/eng/eng.word.bigrams
> /usr/share/tessdata/training/eng/eng.wordlist
>
> Regards
> Marco
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/55BDC558.2090205%40gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWr4%3Dw1%3D024PCj5eKBYs_b3Jx3DOtgGp4UonwyB5EO7Rg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to