Hi Ger,

Thanks a lot for the detailed guidance — it was really helpful.

I ran deeper diagnostics and confirmed a few things:

   -

   Running *Tesseract CLI* directly works perfectly and extracts: *NO
   SMOKING*
   -

   However, when using *gosseract* from Go, I still get *empty text output*
   and a single empty bounding box like:
   Text: [ ],  Box: (1476397136,32579)-(1476956064,32579)

   The image being processed is a valid 8-bit/16-bit PNG (confirmed
via file command).

   -

   Setting *TESSDATA_PREFIX *or  *SetTessdataPrefix("/usr/share/tessdata")*
    works correctly — no language load errors.
   -

   Even after forcing engine mode with *tessedit_ocr_engine_mode = 1 *(LSTM
   only) and using *PSM_SPARSE_TEXT*, gosseract still returns empty text.
   -

   This makes me think gosseract is initializing Tesseract differently
   (maybe not loading the same configs or missing something in the setup
   phase), because the CLI and Go layer are using the same image and tessdata.

Do you have any suggestions for checking whether gosseract is properly
initializing TessBaseAPI with the same defaults as CLI?

Thanks again for your help — your earlier hint about checking bounding
boxes and configuration alignment was spot on.

Best regards,
Harshit

On Sun, Nov 2, 2025 at 4:09 AM Ger Hobbelt <[email protected]> wrote:

> I expect you're in for a debug session.
>
> I do not use Go, so here's just a few general tidbits:
>
> - you tested with the tesseract CLI. Excellent! So that proves things can
> go well at the core; one major problem area less to worry about.
> - next is the gosseract library/layer itself: how does it talk to
> tesseract, what does it pass (and what doesn't it), etc.: from a very swift
> glance at the code, there's nothing blatantly obviously wrong in their
> bindings.cpp, AFAICT. Haven;t looked any further than that.
> - my own usage of tesseract as a library has shown me that getting the
> parameters right can be a bit of a hassle sometimes; one of the potential
> failure modes is not noting that tesseract does not receive the same config
> baseline setup as when it ran via CLI: this is where debugging is mandatory.
>
> My first guess would be to make very sure your tesseract config files are
> loaded the same way. While that can be a bit harsh to do when you're not
> comfortable with running this stuff in a debugger, here's a preparation
> step I would definitely look at if I were you:
> 1. tesseract via your Go code doesn't produce *anything*, while
> 2. tesseract CLI does deliver text ("No smoking")
> which MAY be due to tesseract not finding any text word bounding-boxes
> when run via the Go-code route.
>
> I see they (gosseract) present a GetBoundingBoxes API, so I would first
> try to run that one to see if I get any boxes at all, and if any, where
> they are in the image (i.e.: do I get: (a) no boxes, (b) only get gibberish
> boxes only or (c) at least the ones covering "NO" and "SMOKING", or what?
> Then try the same for the CLI (IIRC vanilla tesseract has an option to
> cough up bboxes only; haven't used that in a while and I'm running a
> customized tesseract here, so check code and documentation, don't take me
> at my word!)
>
> To see what I was looking at:
> https://github.com/otiai10/gosseract/blob/main/tessbridge.cpp#L108
>
> If the bounding boxes don't show up in your Go run, then it smells like a
> config/setup bit not making it into the tesseract engine, so it's debugging
> the gosseract bindings.cpp interlayer to see what happens, really. Are CLI
> and Go code really, really pointing at the same config search paths, for
> example?
> If the bounding boxes show up and match the set in the CLI, we have a
> serious conundrum.
>
> Either way, that's the road I'd travel if walking in your shoes.
> (If you can debug-step the tesseract CLI the same way, you can more easily
> compare both, perhaps, as the CLI is using the same APIs gosseract is using
> (with some differences, but my current bet is those are not relevant).
>
> Also monitor the gosseract/tesseract run for error and warning messages
> from tesseract, as well. If it is silent, maybe force it once to barf a
> hairball, just so you know the error/warning/info outputs are working.
> Whatever you do, my bet is you have some debugging on the road ahead.
>
> Note: I don't do Go, so haven't used gosseract. This would be my general
> tactic though, anyway.
>
>
> Met vriendelijke groeten / Best regards,
>
> Ger Hobbelt
>
> --------------------------------------------------
> web:    http://www.hobbelt.com/
>         http://www.hebbut.net/
> mail:   [email protected]
> mobile: +31-6-11 120 978
> --------------------------------------------------
>
>
> On Fri, Oct 31, 2025 at 6:45 PM Harshit Goel <[email protected]>
> wrote:
>
>> Hi team
>>
>> I’m facing an issue where Tesseract OCR works correctly from the CLI, but
>> returns an empty string when called programmatically using Go (via
>> gosseract).
>>
>> For this particular image:
>> https://pmi-api.ubconnex.ca/files/icons/2025-03/11c6051eec503f52c43f0de382980d31.png,
>> the OCR always returns an empty string when running programmatically. Yet
>> when I run the exact same image manually using Tesseract from terminal by
>> command: *tesseract /tmp/ocr-3678469497.png stdout*
>>
>> It correctly detects and returns *NO SMOKING*
>>
>> *Environment*
>>
>>    - OS: Linux (Server)
>>    -
>>
>>    Tesseract version: tesseract 5.x (CLI works fine)
>>    -
>>
>>    Go binding: github.com/otiai10/gosseract/v2
>>    -
>>
>>    Go version: go1.23.x
>>
>> I've tried with the following approaches but still no effect:
>>
>>    -
>>
>>    Different PSM modes (SPARSE_TEXT, SINGLE_BLOCK, etc.)
>>    -
>>
>>    Preprocessing (grayscale, contrast enhancement, flattening
>>    transparency).
>>    -
>>
>>    Verified that the image file is saved correctly and readable by
>>    Tesseract.
>>    -
>>
>>    Tried increasing image size and contrast.
>>
>> Is there any known discrepancy between the CLI binary and the gosseract
>> API in how page segmentation modes or image preprocessing are handled
>> internally?
>>
>> Any insight on why Tesseract detects text in CLI but gosseract binding
>> returns empty output would be very helpful.
>>
>> Best Regards,
>>
>> Harshit Goel
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foBhh_8kWyiP9-zVyfO8JrxwgDmvm%3DZH5pnE3sHYiu_1g%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foBhh_8kWyiP9-zVyfO8JrxwgDmvm%3DZH5pnE3sHYiu_1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CADRW4UeJiWeZa6aO%2BS2pZoqG1zkMX0q18Rg0efCk7irb5u6Zsw%40mail.gmail.com.

Reply via email to