Not bug; just not up to date with Tesseract 4.x.
On Monday, December 9, 2019 at 12:14:49 AM UTC-6, NY C wrote:
>
> I know there are new OcrEngineMode value in Tesseract.
> But not in tess-two.
>
> In tesseract 4.x, ocrEngineMode is :
>
> enum OcrEngineMode {
> OEM_TESSERACT_ONLY, // Run Tesseract only - fastest; deprecated
> OEM_LSTM_ONLY, // Run just the LSTM line recognizer.
> OEM_TESSERACT_LSTM_COMBINED, // Run the LSTM recognizer, but allow
> fallback
> // to Tesseract when things get difficult.
> // deprecated
> OEM_DEFAULT, // Specify this mode when calling init_*(),
> // to indicate that any of the above modes
> // should be automatically inferred from
> the
> // variables in the language-specific
> config,
> // command-line configs, or if not
> specified
> // in any of the above should be set to the
> // default OEM_TESSERACT_ONLY.
> OEM_COUNT // Number of OEMs
> };
>
> However, in the newest release of tess-two, the ocrEngineMode is :
>
> @IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY,
> OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
> public @interface OcrEngineMode {}
> public static final int OEM_TESSERACT_ONLY = 0;
> @Deprecated
> public static final int OEM_CUBE_ONLY = 1;
> @Deprecated
> public static final int OEM_TESSERACT_CUBE_COMBINED = 2;
> public static final int OEM_DEFAULT = 3;
>
> If there is no way to set OEM_LSTM_ONLY in tess-two,
> I can only assume this is a bug in tess-two.
>
>
>
> Quan Nguyen於 2019年12月9日星期一 UTC+8上午12時38分56秒寫道:
>>
>> There are new OcrEngineMode
>> <https://github.com/tesseract-ocr/tesseract/blob/master/include/tesseract/publictypes.h>
>>
>> values.
>>
>>
>> On Saturday, December 7, 2019 at 7:37:49 PM UTC-6, NY C wrote:
>>>
>>> Hi, I am using tess-two for OCR.
>>>
>>>
>>> (Alex Chon version : https://github.com/alexcohn/tess-two
>>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Falexcohn%2Ftess-two&sa=D&sntz=1&usg=AFQjCNEQGm3c_HnjOOVpdOoDYCwnElOb5Q>
>>> )
>>>
>>>
>>> Code:
>>>
>>> TessBaseAPI baseApi = new TessBaseAPI();
>>> baseApi.setDebug(true);
>>> baseApi.init(pathfiles, language);
>>> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
>>> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
>>> baseApi.setImage(bmp);
>>> result= baseApi.getUTF8Text();
>>> baseApi.end();
>>>
>>>
>>> The code run perfectly when I use this tessdata :
>>> https://github.com/tesseract-ocr/tessdata
>>>
>>> But when I use tessdata_fast (
>>> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on
>>> baseApi.init.
>>>
>>>
>>> There is no error message since the init method calls native C++. As far
>>> as I can trace, the init method crashes on this line:
>>>
>>> boolean success = nativeInitOem(mNativeData, datapath, language,
>>> ocrEngineMode);
>>>
>>>
>>> I also tried to set the OEM like this:
>>>
>>> baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>>>
>>>
>>> All the OEM parameters have been tried :
>>>
>>> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED
>>> = 2, OEM_DEFAULT = 3)
>>>
>>> Crashes as well.
>>>
>>>
>>> How could I fix this?
>>>
>>>
>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/9dad07d8-3ab9-4af3-8296-18ed37e29f02%40googlegroups.com.