Building training tools on windows is not priority. But is should be
possible to compile most of tools with cygwin or msys&mingw.



Zdenko


On Wed, Aug 6, 2014 at 5:20 PM, Shree Devi Kumar <[email protected]>
wrote:

>  My current plan for documentation is as follows:
>>
>> - Rewrite and simplify TrainingTesseract3 on the wiki
>> - Write manpages for each tool in training/
>> - Document how each training file is used, and what it contains
>>
>> Does that sound good to people? I'll take silence from the list to
>> mean "that sounds perfect in every way, you wonderful man." ;)
>
>
> Thanks, Nick. That's great. You should probably have separate sections for
> training 3, 3.02, 3.03, 3.03.03 ...etc. Since the method has changed quite
> a bit.
>
> BTW, do you know if the new training tools can be compiled on Windows or
> do I need to to get access to Linux somewhere to give them a try.
>
>
>
>
>
> Shree Devi Kumar
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Wed, Aug 6, 2014 at 8:23 PM, Nick White <[email protected]>
> wrote:
>
>> Hi Albrecht,
>>
>> Sorry for not replying sooner, I've been away.
>>
>> > Nevertheless I read a post from Ray where he says that he receives
>> > millions of
>> > emails and the last thing he likes to do is writing long texts (email
>> responses
>> > or documentations). I think this is a fatal situation, because if he is
>> the
>> > only one who really knows the code, he is predestined to write that
>> > documenation. But I understood that he is not motivated to do that. He
>> is
>> > testing new classifiers rather than caring about what is already done.
>>
>> Ah, but others can work to figure out how the code and tools work,
>> and slowly but surely piece together documentation. Also, Ray is
>> good at explaining when he has the time. I agree it isn't an ideal
>> situation, but think we can fix it.
>>
>>
>> > I studied the code of the set_unicharset_properties tool.
>> > But this is a very basic tool. It only sets the basic properties.
>> > The min/max values don't get touched
>>
>> This is wrong, actually. The unicharset.SetPropertiesFromOther()
>> function called in set_unicharset_properties copies all properties
>> from any copy of the character found in the script_dir. As I
>> mentioned in my previous message to this thread, set the script_dir
>> to the training/langdata directory and the data from all the
>> .unicharset files there will be pulled in as appropriate.
>>
>> > I'm sure that there must exist a tool
>> > (that is not published) that obtains them, because the han.unicharset
>> has 23514
>> > characters defined - all with min / max values set. Or do you think that
>> > someone has edited 23514 characters manually ?
>>
>> Ultimately, yes, there must be an unpublished tool that obtains the
>> metrics that exist in the training/langdata directory. I suspect it
>> looks quite like the pango based proof of concept I attached to a
>> previous mail on this thread (charmetrics.c).
>>
>> > It is not the way open source projects should work.
>>
>> So, you pick yourself up and jump in! That's how open source
>> projects should work. Patches are welcomed :)
>>
>> > > Are there particular things you'd like
>> > > documentated, that I could start on?
>> >
>> > I would like to generate unicharset files automatically, but I don't
>> know how
>> > to calculate the min/max values.
>>
>> As I say, you can get good general figures by using the --script_dir
>> option with set_unicharset_properties. I think we're clear now on
>> the general definitions of all the fields.
>>
>> To calculate the min/max values for specific fonts where they may be
>> very different, I recommend you try the charmetrics.c tool I posted,
>> and compare the output to what you get without it.
>>
>> > If you want an idea where to start with: I think a good starting point
>> would be
>> > to explain what all these training files are good for and what they do
>> exactly.
>> > What is INTTEMP, what values does it contain exactly, how is it
>> generated in
>> > the training process and how is it used in recognition ?
>> > What is PFFMTABLE good for, NORMPROTO etc.
>> >
>> > And then the DAWG files.
>> > I still did not understand in which step of the recognition the Number
>> DAWG is
>> > used. (Did you see the weird things it contains?)
>> > And what is the PUNC DAWG good for, how is it used exactly ? How should
>> I
>> > generate the values in it ?
>> > What is the difference between a flat shape table and a clustered
>> shapetable ?
>>
>> These are all good points, and good places to start, thank you.
>>
>> My current plan for documentation is as follows:
>>
>> - Rewrite and simplify TrainingTesseract3 on the wiki
>> - Write manpages for each tool in training/
>> - Document how each training file is used, and what it contains
>>
>> Does that sound good to people? I'll take silence from the list to
>> mean "that sounds perfect in every way, you wonderful man." ;)
>>
>> Nick
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/20140806145323.GG7804%40manta.lan
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV2aUSCsuuyednh9j20McdeVM2A2SG1NtYaxLtOBT5gwA%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV2aUSCsuuyednh9j20McdeVM2A2SG1NtYaxLtOBT5gwA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w%3DTB5mfzJ0rsaPoMfeVKXLZBygaQW1q1rmH7077VWGQg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to