Related material: https://github.com/tesseract-ocr/tesseract/issues/1465
"diplopia tesseract" + "tesseract btw vat number" search should pop up a
few similar issues and message list messages from people with similar
unwanted effects due to the same fundamental cause:
the way an lstm (IIRC tesseract mentions it's using BLSTM == bidirectional
lstm somewhere) works goes something like this:
the text line image (cropped from your input image, where the crop box has
been calculated by the tesseract segmentation code) is fed into the lstm
engine as vertical scanline rgb pixel vectors, where the lstm engine
outputs an alphabet probability vector per input pixel vector (aka "time
step" in lstm theory papers). "alphabet" here meaning "the total set of
possible characters we've been set up to recognize", so that's many more
than your regular intuitive ABC...Z.
Let's say, for this thought experiment, that every character (*glyph*) in
your input image is 10 pixels wide, including the little spacing that
separates characters in a printed word, then you'll observe 10 output
vectors while that input character image part is scanned and fed into the
lstm engine. Each output vector mentions a probability for *every*
character in the output alphabet, which for English is the set of
a..z,A..Z,0..9,several punctuation symbols and a few others.
Let's say we're scanning the first zero in your "P01.01" input image, then
most probabilities are very probably quite low and thus negligible. The "0"
looks quite like "O" (capital Oh), somewhat like an "o" (lowercase oh) and
quite like a "0" (zero).
That's one part of the story/process. There's more ...
As the lstm engine is usually trained with a large dictionary, as tesseract
was (if I read the papers by Ray Smith from Google, who set it all up,
correctly) targeted at ocr'ing papers, books and similar printed
publications, we have to consider the question "what was/is in those
dictionaries? And how do those affect us?": it was lots of words you'll
find in regular printed dictionaries, and very few, if any at all, "codes",
i.e. "words" which start with one or more alphanumeric characters (A-Z)
followed by digits (0-9) as that kind of thing is not expected to show up
in books or academic papers, the original target for tesseract (AFAICT).
Assuming this training assumption is correct (and current tesseract's
behaviour tells me I'm not far off the mark), we combine this with the
concept of "Markov chains" (see Wikipedia and/or your formal training at
uni -- and, yes, lstm is way more complex and powerful than HMM and basic
Markov chains, but Markov chains serve well enough here to understand
what's going on and they're way more intuitive for most people, including
yours truly ;-) so I'll use them while trying to explain what you are
experiencing). Anyway, for this scenario we can say that lstm engines
behave like ("smarter, more able") Markov chains, which I'll try to explain
using a few examples:
When we take the English language, and we go back to our youth when we were
playing the hangman game (Dutch: galgje), where you have to guess a word
with the fewest possible guesses, you quickly learn to try particular
characters from the A-Z alphabet first for maximum gain: you try the
vowels, "s" and "n", maybe "t" and only then you start to really use your
brain if you were like me ;-)
That's you, as a kid, intuitively discovering and applying character
frequency of occurrence, i.e. Shannon entropy. (Another inroad into this
may have been you encountering Huffman coding.) We'll get back to tesseract
internals shortly, but for now try to imagine you are being given the left
half of a printed character and that left half looks like "馃" (left half
circle) and your task is to guess which character in the English alphabet
this might be/become: your best bet is "O", second comes "C" and when
numeric digits are allowed, "0" is another good candidate. The lstm engine
is not smarter than you, so you can expect the lstm engine to rank those
relatively high, i.e. give these characters high (above threshold)
probabilities. And we haven't applied Markov chains yet, which help us
improve our estimates!
The easiest example for Markov's usefulness is the letter "q": if we only
use character frequency/probability, i.e. Shannon, we know "q" doesn't show
up all that often. Another letter: "u" is a vowel, but not the most
frequent one by far, so that's another relatively low probability if all we
got is Shannon. Markov chains, however, look at the *preceding code
element* for improved statistics and that is a game changer: when you
observe a "q" in your input stream, it's a near-100% bet that one will be
followed by "u" and no alternative! (This "qu" sure thing goes for a lot of
European languages, and only when you start feeding Chinese pinyin or
Chinese person names to your machine do you lose that particular bet as
then you'll observe input such as "Qi Qi", which is a woman's name, like
Mary and John, but I digress ...)
So, taking the additional knowledge about frequency of occurrence, hence
*probability* for *character sequences*, due to Markov chains, we may
realize that once we've observed a "P", the probability distribution for
our expected next character changes noticeably and one very good option is
the sequence "po", temporarily raising the probability of an "o" occurring
next, while the probabilities of many other characters temporarily
decrease, e.g. "n" may be quite frequent on its own (notice those "qu"s
just now, by the way? 馃槈 ), but "pn" is highly improbable. Only word with
"pn" I can think of right now, off the top of my head, is "apnea", which I
haven't heard all that often in human conversation, have you? 馃榿
So far, so good: we now expect to rank anything that's starting to look a
little curvy (left half circle! = only 5 pixel vectors done and already we
are like the kid in class who'd raise their hand when the teacher hadn't
even finished asking the question) as a sure thing: "PO" output!
The lstm engine is quite like that and it starts to yell
this-is-probably-capiral-Oh (plus a few more highly-probables) quite early
while we scan across that initial zero. Of course the tesseract engine,
having been trained on numbers alongside those human dictionary words, is a
little doubtful while keeping it's hand raised, so "0" is listed as second
best probability. As we near the end of the character that doubt may turn
into a raising awareness that this bugger, while unexpected as it didn't
get any "codes" for training so *any* numeric digit immediately following
an alphanumeric character (A-Z) is very much unexpected by the Markov chain
/ lstm engine, is worryingly starting to look like a zero more and more.
(Yes, nitpicks, an lstm engine is not an emotional human, not a HMM / pure
Markov chain, but the story holds well enough and the machine reality is
very similar for this occasion)
Lstm "memory" is more powerful than mere Markov chains technology because
it can look further back than just the previous character, so French "EA"
occurrence will kick next-bet-is-"U" into the top ranks for example:
*chapeau*, *bureau*, and many more have taught it to bet that way.
Now we combine the two stories above (Markov/lstm context driven
probability distributions + vertical scanline input processing giving you
one prediction output vector per horizontal pixel step) and you can expect
to see tesseract lstm spit out probability sequences such as
. . O/0/C O/0/C O/0 O O . C/0/O 0/O
for our 10 pixel wide "0" image part we assumed at the beginning. The
middle . in this 10 vector sequence is tesseract getting confused / in
doubt and it's very common to see this kind of bumpy/hickupping probability
vectors coming out of lstm engines (first character in / set being ranked
highest, obviously, etc.)
Which leads us to the third and last major magic part of the tesseract OCR
engine process: how does tesseract cope with such noisy/burpy lstm output?
Tesseract is not alone in this and many other lstm users / researchers
faced the same problem. If you don't do anything about it, chances are that
a simple "hello" image input could end up as "lhheeeellooooo" text output,
which is... not so great.
Of course you can feed this to hunspell or similar swift dictionary+edit
distance based validation and correction technology and get "hello" as
most-probable-candidate and take it from there, and many have, but some
folks came up with something that *should* be a bit more flexible than
basic (fast) dictionary lookup + Levenshtein et al and that something is
built into tesseract alongside the lstm engine and is called CTC (Connectionist
temporal classification) which you can consider a sort of lstm engine
output post-processor, which is fed those alphabet probability vectors (one
per horizontal input pixel) and outputs a, let's say, *cleaned up revision*
of those so that we can get "hello" without the obnoxious repeats.
Ditto for the scenario where we were scanning through the pixel columns if
that initial "0" following that "P" in "P01.01".
INTERMISSION 馃槹: I am still struggling to grok CTCs myself; it's all nice
and dandy when they work out of the box but I know I don't *get them* yet
as I don't know what I could do/modify when these buggers *fail*, such as
in the diplopia and OP's "P01.01" scenarios. Hence take my words as a
"possibly-maybe" guide where I talk about CTC and do read the related
papers, keeping in mind that I may have got it all *wrong*! Just a friendly
warning: YMMV, caveat emptor and all that jazz. 馃槄
While Wikipedia is leading me astray where it concerns CTCs,
https://sid2697.github.io/Blog_Sid/algorithm/2019/10/19/CTC-Loss.html is
one that proved helpful to my earlier attempts to grok CTCs; check it out
and do your research.
My current understanding goes something like this: let's take that burpy
lstm output we listed above:
. . O/0/C O/0/C O/0 O O . C/0/O 0/O
which tesseract will feed into its CTC engine part, which *should ideally*
clean it up and spit out this (what tesseract CTC *actually* produces we'll
discuss after this):
系 系 O/0 O/0 O/0 O/0 O/0 O/0 O/0 O/0
where 系 denotes "empty" i.e. "expect next decoded character after this
break" so that tesseract would thus parse the "P01.01" image as "PO1.01" as
"O" is shown as the most probable in this example. And yes, that "O
mistake" is intentional! We'll address that at the end, because OP is
facing two practical problems at once:
- the trigger: tesseract, trained to decode books and papers, i.e.
regular/sophisticated human written language, is fed "text codes" here
instead ("P01.01", or Dutch VAT (BTW) "numbers" about a month ago in this
mailing list), which are outside the trained language models and thus way
outside the area of high quality, low WER (word error rate) output one may
otherwise expect from an ocr engine like this,
- root cause (AFAICT): CTC stage, as a concept, is not engineered to cope
with recognition confusion like this, where lstm memory characteristic
("Markov" like context sensitive statistics, roughly speaking) drives
probability towards human language word forms, which initially "win out"
(O!) and then gets flipped half-way through due to increasing shape hints
pushing towards numeric digit recognition (0!), with possible 系 producing
confusion along the way.
This is where things get really complicated as you will observe the same
sort of CTC "mistake" happening in the mentioned "diplopia" case: AFAICT
the lstm engine may output intermittent, rare!, 系 vectors when input shapes
are *somehow* bad enough that the lstm engine becomes "unsure" halfway
through a character and thus produces "heeello系ooo", resulting in CTC
output "helloo" instead of "hello". Handwave here as I'm seeing a lot of
noise, debugging this thing is a 馃悥 and as I said: I don't grok CTCs yet,
so reasonable doubt aplenty. (I still keep my options open as I sometimes
see sequences without that in-the-middle-of-it 系 and still getting
screwed up final output, so there might be a bug lurking in there as well
as a fundamental outside-training-remit use triggering all sorts of
potential weirdness. Plus my own sub-par comprehension of CTC I warned
y'all about...馃う...)
Anyhow, while the "ideal" expected CTC outputs while scanning the initial
"0" zero should be either afore-mentioned:
系 系 O/0 O/0 O/0 O/0 O/0 O/0 O/0 O/0
or even better for OP:
系 系 0/O 0/O 0/O 0/O 0/O 0/O 0/O 0/O
the harsh reality is the CTC stage producing something rather more like
this:
系 系 O/0 O/0 O/0 O/0 O/0 系 0/O 0/O
and that's where it all turns to 馃挬 as the final chunk of tesseract code
takes the CTC output, removes the character runlengths (as it should),
picks the top choice and produces usable text plus character bounding
boxes; since lstm+CTC don't produce character pixel boundary markers by
design, heuristics are used to chop the word image into character boxes,
next to the segmentation-section produced word bounding box:
PO01.01
in this case bounding boxes for "O" and initial "0" largely overlap.
All of which may be written to hocr files or csv format or other output
formats; the latter carrying the same or less information.
Which brings us to the end:
Currently, you can detect such diplopia/confusion occasions by
postprocessing the tesseract output in userland scripts/code, which should
inspect the character-level bounding boxes to detect these issues with
(hopefully!) high probability. The bits working against making this a "sure
thing" are:
- regular print already has plenty situations where characters' bounding
boxes overlap a little or a lot: think
+ *cursive & slanted text*
+ ligatures (which may or may not be expanded)
+ negative kerning in original print ("Ta" and other likely candidates)
- lstm (v4/v5) engine doesn't do character boundary markers, so character
boxes are calculated using heuristics. I still have to dig through that
chunk of code again, but the smell it gives off is one where ( quite
sensibly!) some basic font metrics are assumed and character shape start
positions are estimated from the lstm output; I've yet to reread/check this
code chunk but my bet today is on that code also producing potential
bounding box overlap, until I'm absolutely sure it can't...
HTH,
Ger
On Sat, 2 Dec 2023, 00:50 Tom Morris, <[email protected]> wrote:
> On Thursday, November 30, 2023 at 11:11:08鈥疉M UTC-5 [email protected]
> wrote:
>
> I'm running an image through Tesseract via a PHP library (
> https://github.com/thiagoalessio/tesseract-ocr-for-php).
>
>
> There's a bunch of really useful information missing (e.g. version of
> Tesseract), but fortunately this is easily reproducible with the current
> development version.
>
>
> The ouput seems to contain two potential matches for a single character.
>
>
> That's not what's happening. It's actually recognizing both characters
> separately, although I'm not sure why. The engine does consider the correct
> string, but the incorrect string scores higher. I'm not familiar enough
> with the internals to interpret it, but the debug output is below in case
> someone else wants to give it a go. As you can see it considers both P01.01
> and P0O1.01, but picks the latter because it's got a (marginally) better
> score.
>
> Tom
>
> Processing word with lang eng at:Bounding box=(21,46)->(143,75)
> Trying word using lang eng, oem 1
> Created window Convolve of size 530, 1700
> Created window ConvNL of size 530, 2000
> Created window Lfys64 of size 530, 2000
> Created window Lfx96 of size 530, 1534
> Created window Lrx96 of size 530, 1534
> Created window Lfx512 of size 530, 2000
> Created window Output of size 530, 1761
> Created window LSTMForward of size 1418, 580
> <null>=110 On [0, 2), scores= 100(i=83=0.00155) 100(P=28=0.00331),
> Mean=99.9928, max=99.9963
> P=28 On [2, 8), scores= 26.8(<null>=110=69) 97.1(p=103=2.5)
> 90.3(<null>=110=9.35) 3.18(<null>=110=96.8) 4.89e-05(<null>=110=100)
> 6.7e-05(<null>=110=92.2), Mean=36.2217, max=97.0573
> 0=33 On [8, 9), scores= 67(O=21=31.2), Mean=67.0461, max=67.0461
> O=21 On [9, 13), scores= 56.8(0=33=42.5) 10.5(<null>=110=82)
> 1.01e-05(<null>=110=100) 7.72e-06(<null>=110=100), Mean=16.8307, max=56.8101
> 1=34 On [13, 18), scores= 64.2(<null>=110=34.6) 99.4(l=87=0.559)
> 98.5(<null>=110=1.46) 5.78(<null>=110=94.2) 1.12e-05(<null>=110=100),
> Mean=53.576, max=99.3538
> .=23 On [18, 23), scores= 83.1(<null>=110=16.9) 100(,=15=0.00592)
> 94.8(<null>=110=5.24) 0.0434(<null>=110=100) 2.41e-07(<null>=110=100),
> Mean=55.5837, max=99.9923
> 0=33 On [23, 28), scores= 52.6(<null>=110=47.4) 99.8(O=21=0.124)
> 82(<null>=110=17.9) 0.000255(<null>=110=100) 3.13e-11(<null>=110=100),
> Mean=46.8875, max=99.8451
> 1=34 On [28, 33), scores= 46.1(<null>=110=53.8) 99.8(l=87=0.0986)
> 99.1(<null>=110=0.894) 0.586(<null>=110=99.4) 8.5e-09(<null>=110=100),
> Mean=49.1089, max=99.8028
> 0 null_char score=-0.191493, c=-0.191493, perm=2, hash=0
> 1 null_char score=-0.382826, c=-0.191333, perm=2, hash=0 prev:null_char
> score=-0.191493, c=-0.191493, perm=2, hash=0
> 2 label=28, uid=30=P [50 ]A score=-0.66966, c=-0.286834, perm=2, hash=1c
> prev:null_char score=-0.382826, c=-0.191333, perm=2, hash=0
> 3 label=28, uid=30=P [50 ]A score=-0.928113, c=-0.258453, perm=2, hash=1c
> prev:label=28, uid=30=P [50 ]A score=-0.66966, c=-0.286834, perm=2, hash=1c
> 4 label=28, uid=30=P [50 ]A score=-1.12845, c=-0.200338, perm=2, hash=1c
> prev:label=28, uid=30=P [50 ]A score=-0.928113, c=-0.258453, perm=2, hash=1c
> 5 null_char score=-1.39284, c=-0.264391, perm=2, hash=1c prev:label=28,
> uid=30=P [50 ]A score=-1.12845, c=-0.200338, perm=2, hash=1c
> 6 null_char score=-1.58412, c=-0.191278, perm=2, hash=1c prev:null_char
> score=-1.39284, c=-0.264391, perm=2, hash=1c
> 7 null_char score=-1.95755, c=-0.373434, perm=2, hash=1c prev:null_char
> score=-1.58412, c=-0.191278, perm=2, hash=1c
> 8 label=33, uid=35=0 [30 ]0 score=-3.04833, c=-1.09078, perm=2, hash=c45
> prev:null_char score=-1.95755, c=-0.373434, perm=2, hash=1c
> 9 label=21, uid=23=O [4f ]A score=-4.51186, c=-1.46353, perm=2, hash=55200
> prev:label=33, uid=35=0 [30 ]0 score=-3.04833, c=-1.09078, perm=2, hash=c45
> 10 label=21, uid=23=O [4f ]A score=-4.87753, c=-0.365671, perm=2,
> hash=55200 prev:label=21, uid=23=O [4f ]A score=-4.51186, c=-1.46353,
> perm=2, hash=55200
> 11 null_char score=-5.06878, c=-0.191256, perm=2, hash=55200
> prev:label=21, uid=23=O [4f ]A score=-4.87753, c=-0.365671, perm=2,
> hash=55200
> 12 null_char score=-5.26093, c=-0.192142, perm=2, hash=55200
> prev:null_char score=-5.06878, c=-0.191256, perm=2, hash=55200
> 13 label=34, uid=36=1 [31 ]0 score=-5.47957, c=-0.218643, perm=2,
> hash=24e8e22 prev:null_char score=-5.26093, c=-0.192142, perm=2, hash=55200
> 14 label=34, uid=36=1 [31 ]0 score=-5.68541, c=-0.205837, perm=2,
> hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-5.47957, c=-0.218643,
> perm=2, hash=24e8e22
> 15 label=34, uid=36=1 [31 ]0 score=-5.91023, c=-0.224826, perm=2,
> hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-5.68541, c=-0.205837,
> perm=2, hash=24e8e22
> 16 label=34, uid=36=1 [31 ]0 score=-6.10178, c=-0.191543, perm=2,
> hash=24e8e22 prev:label=34, uid=36=1 [31 ]0 score=-5.91023, c=-0.224826,
> perm=2, hash=24e8e22
> 17 null_char score=-6.29319, c=-0.191418, perm=2, hash=24e8e22
> prev:label=34, uid=36=1 [31 ]0 score=-6.10178, c=-0.191543, perm=2,
> hash=24e8e22
> 18 label=23, uid=25=. [2e ]p score=-6.48452, c=-0.191326, perm=2,
> hash=1000fa0d5 prev:null_char score=-6.29319, c=-0.191418, perm=2,
> hash=24e8e22
> 19 label=23, uid=25=. [2e ]p score=-6.67594, c=-0.191424, perm=2,
> hash=1000fa0d5 prev:label=23, uid=25=. [2e ]p score=-6.48452, c=-0.191326,
> perm=2, hash=1000fa0d5
> 20 label=23, uid=25=. [2e ]p score=-6.8672, c=-0.191259, perm=2,
> hash=1000fa0d5 prev:label=23, uid=25=. [2e ]p score=-6.67594, c=-0.191424,
> perm=2, hash=1000fa0d5
> 21 null_char score=-7.05943, c=-0.192229, perm=2, hash=1000fa0d5
> prev:label=23, uid=25=. [2e ]p score=-6.8672, c=-0.191259, perm=2,
> hash=1000fa0d5
> 22 null_char score=-7.2507, c=-0.191266, perm=2, hash=1000fa0d5
> prev:null_char score=-7.05943, c=-0.192229, perm=2, hash=1000fa0d5
> 23 label=33, uid=35=0 [30 ]0 score=-7.44305, c=-0.192357, perm=2,
> hash=6f06c6bc7c prev:null_char score=-7.2507, c=-0.191266, perm=2,
> hash=1000fa0d5
> 24 label=33, uid=35=0 [30 ]0 score=-7.63779, c=-0.194738, perm=2,
> hash=6f06c6bc7c prev:label=33, uid=35=0 [30 ]0 score=-7.44305, c=-0.192357,
> perm=2, hash=6f06c6bc7c
> 25 label=33, uid=35=0 [30 ]0 score=-7.83177, c=-0.193978, perm=2,
> hash=6f06c6bc7c prev:label=33, uid=35=0 [30 ]0 score=-7.63779, c=-0.194738,
> perm=2, hash=6f06c6bc7c
> 26 null_char score=-8.02303, c=-0.19126, perm=2, hash=6f06c6bc7c
> prev:label=33, uid=35=0 [30 ]0 score=-7.83177, c=-0.193978, perm=2,
> hash=6f06c6bc7c
> 27 null_char score=-8.21431, c=-0.191279, perm=2, hash=6f06c6bc7c
> prev:null_char score=-8.02303, c=-0.19126, perm=2, hash=6f06c6bc7c
> 28 label=34, uid=36=1 [31 ]0 score=-8.40869, c=-0.194379, perm=2,
> hash=3023f02bb9e6 prev:null_char score=-8.21431, c=-0.191279, perm=2,
> hash=6f06c6bc7c
> 29 label=34, uid=36=1 [31 ]0 score=-8.60438, c=-0.195692, perm=2,
> hash=3023f02bb9e6 prev:label=34, uid=36=1 [31 ]0 score=-8.40869,
> c=-0.194379, perm=2, hash=3023f02bb9e6
> 30 label=34, uid=36=1 [31 ]0 score=-8.79638, c=-0.192, perm=2,
> hash=3023f02bb9e6 prev:label=34, uid=36=1 [31 ]0 score=-8.60438,
> c=-0.195692, perm=2, hash=3023f02bb9e6
> 31 null_char score=-9.00085, c=-0.204469, perm=2, hash=3023f02bb9e6
> prev:label=34, uid=36=1 [31 ]0 score=-8.79638, c=-0.192, perm=2,
> hash=3023f02bb9e6
> 32 null_char score=-9.1921, c=-0.191251, perm=2, hash=3023f02bb9e6
> prev:null_char score=-9.00085, c=-0.204469, perm=2, hash=3023f02bb9e6
>
> Second choice path:
> 2 30=P [50 ]A r=1.12845, c=-0.286834, s=0, e=0, perm=2
> 8 35=0 [30 ]0 r=3.98936, c=-2.11872, s=0, e=0, perm=2
> 13 36=1 [31 ]0 r=1.86124, c=-0.636992, s=0, e=0, perm=2
> 18 25=. [2e ]p r=0.765426, c=-0.191424, s=0, e=0, perm=2
> 23 35=0 [30 ]0 r=0.964568, c=-0.194738, s=0, e=0, perm=2
> 28 36=1 [31 ]0 r=1.36033, c=-0.204469, s=0, e=0, perm=2
> Path total rating = 10.0694
> 2 30=P [50 ]A r=1.12845, c=-0.286834, s=0, e=0, perm=2
> 8 35=0 [30 ]0 r=1.91988, c=-1.09078, s=0, e=0, perm=2
> 9 23=O [4f ]A r=1.8292, c=-1.46353, s=0, e=0, perm=2
> 13 36=1 [31 ]0 r=1.22425, c=-0.224826, s=0, e=0, perm=2
> 18 25=. [2e ]p r=0.765426, c=-0.191424, s=0, e=0, perm=2
> 23 35=0 [30 ]0 r=0.964568, c=-0.194738, s=0, e=0, perm=2
> 28 36=1 [31 ]0 r=1.36033, c=-0.204469, s=0, e=0, perm=2
> Path total rating = 9.1921
> Best choice: accepted=0, adaptable=0, done=1 : Lang result : P0O1.01 :
> R=9.1921, C=-10.2447, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0
> pos NORM NORM NORM NORM NORM NORM NORM
> str P 0 O 1 . 0 1
> state: 1 1 1 1 1 1 1
> C -0.287 -1.091 -1.464 -0.225 -0.191 -0.195 -0.204
> 1 new words better than 0 old words: r: 9.1921 v 0 c: -10.2447 v 0 valid
> dict: 0 v 0
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/9f2d2046-2b24-473c-8ff5-d1325970b03en%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/9f2d2046-2b24-473c-8ff5-d1325970b03en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fqTuk%3DtZq5EVT0Dx5FjSFsQ1Y%2BTOK0cPpiPo2oUktE6aQ%40mail.gmail.com.