Hello list :)

i'm trying to make tesseract 3 run with my custom train files, and i have a few 
question if some of you can answer these.

1) Can i combine the old 2.04 xxx.* files to the new xxx.trainedata directly, 
or is the new format also expecting new things which weren't in the old 2.04 
files?

2) i'm not sure i understand the xxx.punc-dawg and xxx.number-dawg. Can someone 
explain? As far as i understand, it's desambiguation for punctuation and 
numbers, right? So what am i supposed to do, for example if one D resembles to 
one 0:
1       D       1       0 -> goes in freq-dawg,
1       0       1       D -> goes in numbers-dawg?
Or only numbers should go in numbers-dawg, for example
1       8       1       0 -> Only goes in numbers-dawg then?

3) i concatened my old config files (Just duplicating freq-dawg to punc / 
numbers), and i'm trying to run tesseract... i'm getting an assertion failure 
(Probably on a map) saying that num <= (SIZE_MAX/elementSize, see stack below).

Anyone successfully made the svn319 run with custom traineddata?

Thanks,
Pierre.

        XXX.dll!_fread_nolock_s(void * buffer=0x10560040, unsigned int 
bufferSize=4294967295, unsigned int elementSize=8, unsigned int num=1130430464, 
_iobuf * stream=0x00750c60)  Line 156 + 0x35 bytes       C
        XXX.dll!fread_s(void * buffer=0x10560040, unsigned int 
bufferSize=4294967295, unsigned int elementSize=8, unsigned int 
count=1130430464, _iobuf * stream=0x00750c60)  Line 109 + 0x19 bytes     C
        XXX.dll!fread(void * buffer=0x10560040, unsigned int elementSize=8, 
unsigned int count=1130430464, _iobuf * stream=0x00750c60)  Line 303 + 0x17 
bytes   C
        XXX.dll!tesseract::SquishedDawg::read_squished_dawg(_iobuf * 
file=0x00750c60, tesseract::DawgType type=DAWG_TYPE_PUNCTUATION, const STRING & 
lang={...}, PermuterType perm=PUNC_PERM)  Line 298 + 0x19 bytes    C++
        XXX.dll!tesseract::SquishedDawg::SquishedDawg(_iobuf * file=0x00750c60, 
tesseract::DawgType type=DAWG_TYPE_PUNCTUATION, const STRING & lang={...}, 
PermuterType perm=PUNC_PERM)  Line 350       C++
        XXX.dll!tesseract::Dict::init_permute()  Line 276 + 0x30 bytes  C++
        XXX.dll!tesseract::Wordrec::program_editup(const char * 
textbase=0x00000000, bool init_permute=true)  Line 98   C++
        XXX.dll!tesseract::Wordrec::start_recog(const char * 
textbase=0x00000000)  Line 75      C++
        XXX.dll!tesseract::Tesseract::init_tesseract(const char * 
arg0=0x006bd948, const char * textbase=0x00000000, const char * 
language=0x006bd944, char * * configs=0x00000000, int configs_size=0, bool 
configs_global_only=false)  Line 186       C++
        XXX.dll!tesseract::TessBaseAPI::Init(const char * datapath=0x006bd948, 
const char * language=0x006bd944, char * * configs=0x00000000, int 
configs_size=0, bool configs_global_only=false)  Line 154 + 0x44 bytes        
C++
>       XXX.dll!tesseract::TessBaseAPI::Init(const char * datapath=0x006bd948, 
> const char * language=0x006bd944)  Line 141      C++

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to