Hello list :)
i'm trying to make tesseract 3 run with my custom train files, and i have a few
question if some of you can answer these.
1) Can i combine the old 2.04 xxx.* files to the new xxx.trainedata directly,
or is the new format also expecting new things which weren't in the old 2.04
files?
2) i'm not sure i understand the xxx.punc-dawg and xxx.number-dawg. Can someone
explain? As far as i understand, it's desambiguation for punctuation and
numbers, right? So what am i supposed to do, for example if one D resembles to
one 0:
1 D 1 0 -> goes in freq-dawg,
1 0 1 D -> goes in numbers-dawg?
Or only numbers should go in numbers-dawg, for example
1 8 1 0 -> Only goes in numbers-dawg then?
3) i concatened my old config files (Just duplicating freq-dawg to punc /
numbers), and i'm trying to run tesseract... i'm getting an assertion failure
(Probably on a map) saying that num <= (SIZE_MAX/elementSize, see stack below).
Anyone successfully made the svn319 run with custom traineddata?
Thanks,
Pierre.
XXX.dll!_fread_nolock_s(void * buffer=0x10560040, unsigned int
bufferSize=4294967295, unsigned int elementSize=8, unsigned int num=1130430464,
_iobuf * stream=0x00750c60) Line 156 + 0x35 bytes C
XXX.dll!fread_s(void * buffer=0x10560040, unsigned int
bufferSize=4294967295, unsigned int elementSize=8, unsigned int
count=1130430464, _iobuf * stream=0x00750c60) Line 109 + 0x19 bytes C
XXX.dll!fread(void * buffer=0x10560040, unsigned int elementSize=8,
unsigned int count=1130430464, _iobuf * stream=0x00750c60) Line 303 + 0x17
bytes C
XXX.dll!tesseract::SquishedDawg::read_squished_dawg(_iobuf *
file=0x00750c60, tesseract::DawgType type=DAWG_TYPE_PUNCTUATION, const STRING &
lang={...}, PermuterType perm=PUNC_PERM) Line 298 + 0x19 bytes C++
XXX.dll!tesseract::SquishedDawg::SquishedDawg(_iobuf * file=0x00750c60,
tesseract::DawgType type=DAWG_TYPE_PUNCTUATION, const STRING & lang={...},
PermuterType perm=PUNC_PERM) Line 350 C++
XXX.dll!tesseract::Dict::init_permute() Line 276 + 0x30 bytes C++
XXX.dll!tesseract::Wordrec::program_editup(const char *
textbase=0x00000000, bool init_permute=true) Line 98 C++
XXX.dll!tesseract::Wordrec::start_recog(const char *
textbase=0x00000000) Line 75 C++
XXX.dll!tesseract::Tesseract::init_tesseract(const char *
arg0=0x006bd948, const char * textbase=0x00000000, const char *
language=0x006bd944, char * * configs=0x00000000, int configs_size=0, bool
configs_global_only=false) Line 186 C++
XXX.dll!tesseract::TessBaseAPI::Init(const char * datapath=0x006bd948,
const char * language=0x006bd944, char * * configs=0x00000000, int
configs_size=0, bool configs_global_only=false) Line 154 + 0x44 bytes
C++
> XXX.dll!tesseract::TessBaseAPI::Init(const char * datapath=0x006bd948,
> const char * language=0x006bd944) Line 141 C++
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.