[tesseract-ocr] Re: tess-two with tessdata_fast crashes

2019-12-08 Thread NY C
I know there are new OcrEngineMode value in Tesseract.
But not in tess-two.

In tesseract 4.x, ocrEngineMode is :

enum OcrEngineMode {
  OEM_TESSERACT_ONLY,   // Run Tesseract only - fastest; deprecated
  OEM_LSTM_ONLY,// Run just the LSTM line recognizer.
  OEM_TESSERACT_LSTM_COMBINED,  // Run the LSTM recognizer, but allow 
fallback
// to Tesseract when things get difficult.
// deprecated
  OEM_DEFAULT,  // Specify this mode when calling init_*(),
// to indicate that any of the above modes
// should be automatically inferred from the
// variables in the language-specific 
config,
// command-line configs, or if not specified
// in any of the above should be set to the
// default OEM_TESSERACT_ONLY.
  OEM_COUNT // Number of OEMs
};

However, in the newest release of tess-two, the ocrEngineMode is :

@IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY, 
OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
public @interface OcrEngineMode {}
public static final int OEM_TESSERACT_ONLY = 0;
@Deprecated
public static final int OEM_CUBE_ONLY = 1;
@Deprecated
public static final int OEM_TESSERACT_CUBE_COMBINED = 2;
public static final int OEM_DEFAULT = 3;

If there is no way to set OEM_LSTM_ONLY in tess-two,
I can only assume this is a bug in tess-two.



Quan Nguyen於 2019年12月9日星期一 UTC+8上午12時38分56秒寫道:
>
> There are new OcrEngineMode 
> 
>  
> values.
>
>
> On Saturday, December 7, 2019 at 7:37:49 PM UTC-6, NY C wrote:
>>
>> Hi, I am using tess-two for OCR.
>>
>>
>> (Alex Chon version : https://github.com/alexcohn/tess-two 
>> 
>> )
>>
>>
>> Code:
>>
>> TessBaseAPI baseApi = new TessBaseAPI();
>> baseApi.setDebug(true);
>> baseApi.init(pathfiles, language);
>> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
>> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
>> baseApi.setImage(bmp);
>> result= baseApi.getUTF8Text();
>> baseApi.end();
>>
>>
>> The code run perfectly when I use this tessdata :
>> https://github.com/tesseract-ocr/tessdata
>>
>> But when I use tessdata_fast (
>> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on 
>> baseApi.init.
>>
>>
>> There is no error message since the init method calls native C++. As far 
>> as I can trace, the init method crashes on this line:
>>
>> boolean success = nativeInitOem(mNativeData, datapath, language, 
>> ocrEngineMode);
>>
>>
>> I also tried to set the OEM like this: 
>>
>>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>>
>>
>> All the OEM parameters have been tried :
>>
>> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 
>> 2, OEM_DEFAULT = 3) 
>>
>> Crashes as well.
>>
>>
>> How could I fix this?
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4a0d7fba-73fe-43d9-96e7-55072b82f876%40googlegroups.com.


Re: [tesseract-ocr] I cannot use traineddata downloaded from Data Files

2019-12-08 Thread 坂本聖
The output is this one.

$ tesseract --version
tesseract 4.0.0-beta.1
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 
4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

 Found AVX2
 Found AVX
 Found SSE

2019年12月9日月曜日 1時43分52秒 UTC+9 zdenop:
>
> what is output of:
>  tesseract --version
>
> Zdenko
>
>
> ne 8. 12. 2019 o 15:55 坂本聖 > 
> napísal(a):
>
>> Thanks for your advice.
>> I downdloaded files by clicking the "download" button in 
>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
>> .
>> And I moved the chi_sim.traineddata file 
>> to  /usr/share/tesseract-ocr/4.00/tessdata/ , and checked the file (which 
>> size is 42.3MB)  exactly there.
>> But, I cannot use tesseract.
>> As I said, I can use tesseract with the file downloaded by executing sudo 
>> apt install tesseract-ocr-chi-sim, but the data downloaded from Data files 
>> did not work.
>> I cannot understand why it did not work.
>>
>> 2019年12月8日日曜日 23時15分31秒 UTC+9 zdenop:
>>>
>>> How did you downloaded files from repository?
>>> Please check files in  /usr/share/tesseract-ocr/4.00/tessdata/ if there 
>>> have the same size as in repository.
>>>
>>> Zdenko
>>>
>>>
>>> so 7. 12. 2019 o 17:34 坂本聖  napísal(a):
>>>
 Hi,
 I want to use tesseract for Chinese words. So, first I tried to execute 
 the command 
 sudo apt install tesseract-ocr-chi-sim 
 And, I can find chi_sim.traineddata in 
 /usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also 
 downloaded chi_tra and jpn.)

 $ tesseract --list-langs

 List of available languages (5):

 chi_sim

 chi_tra

 eng

 jpn

 osd


 Actually, I can use tesseract, but I want to do ocr more accurately, so 
 I want to use chi_sim.traineddata downloaded from here.

 https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
 After I executed the command
 sudo apt remove tesseract-ocr-chi-sim
 I put the new chi_sim.traineddata in 
 /usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract. 
 However I cannot like this.

 $ tesseract 0.jpeg output -l chi_sim

 Error opening data file 
 /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata

 Please make sure the TESSDATA_PREFIX environment variable is set to 
 your "tessdata" directory.

 Failed loading language 'chi_sim'

 Tesseract couldn't load any languages!

 Could not initialize tesseract.


 Then, I tried like this, but I cannot.


 $ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse
 ract-ocr/4.00/tessdata

 Error opening data file 
 /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata

 Please make sure the TESSDATA_PREFIX environment variable is set to 
 your "tessdata" directory.

 Failed loading language 'chi_sim'

 Tesseract couldn't load any languages!

 Could not initialize tesseract.


 Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata 
 and tried again, but I cannot.


 $ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/

 $ tesseract 0.jpeg output -l chi_sim

 Error opening data file 
 /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata

 Please make sure the TESSDATA_PREFIX environment variable is set to 
 your "tessdata" directory.

 Failed loading language 'chi_sim'

 Tesseract couldn't load any languages!

 Could not initialize tesseract.


 If I execute the language list, I can find chi_sim.traineddata again.

 $ tesseract --list-langs

 List of available languages (5):

 chi_sim

 chi_tra

 eng

 jpn

 osd


 Please tell me why I cannot use the traineddata downloaded from 
 https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata 
 ?
  
 Did I make a mistake?

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesser...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com
  
 
 .

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and 

Re: [tesseract-ocr] I cannot use traineddata downloaded from Data Files

2019-12-08 Thread Zdenko Podobny
what is output of:
 tesseract --version

Zdenko


ne 8. 12. 2019 o 15:55 坂本聖  napísal(a):

> Thanks for your advice.
> I downdloaded files by clicking the "download" button in
> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata.
> And I moved the chi_sim.traineddata file
> to  /usr/share/tesseract-ocr/4.00/tessdata/ , and checked the file (which
> size is 42.3MB)  exactly there.
> But, I cannot use tesseract.
> As I said, I can use tesseract with the file downloaded by executing sudo
> apt install tesseract-ocr-chi-sim, but the data downloaded from Data files
> did not work.
> I cannot understand why it did not work.
>
> 2019年12月8日日曜日 23時15分31秒 UTC+9 zdenop:
>>
>> How did you downloaded files from repository?
>> Please check files in  /usr/share/tesseract-ocr/4.00/tessdata/ if there
>> have the same size as in repository.
>>
>> Zdenko
>>
>>
>> so 7. 12. 2019 o 17:34 坂本聖  napísal(a):
>>
>>> Hi,
>>> I want to use tesseract for Chinese words. So, first I tried to execute
>>> the command
>>> sudo apt install tesseract-ocr-chi-sim
>>> And, I can find chi_sim.traineddata in
>>> /usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also
>>> downloaded chi_tra and jpn.)
>>>
>>> $ tesseract --list-langs
>>>
>>> List of available languages (5):
>>>
>>> chi_sim
>>>
>>> chi_tra
>>>
>>> eng
>>>
>>> jpn
>>>
>>> osd
>>>
>>>
>>> Actually, I can use tesseract, but I want to do ocr more accurately, so
>>> I want to use chi_sim.traineddata downloaded from here.
>>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
>>> After I executed the command
>>> sudo apt remove tesseract-ocr-chi-sim
>>> I put the new chi_sim.traineddata in
>>> /usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract.
>>> However I cannot like this.
>>>
>>> $ tesseract 0.jpeg output -l chi_sim
>>>
>>> Error opening data file
>>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>>
>>> Please make sure the TESSDATA_PREFIX environment variable is set to your
>>> "tessdata" directory.
>>>
>>> Failed loading language 'chi_sim'
>>>
>>> Tesseract couldn't load any languages!
>>>
>>> Could not initialize tesseract.
>>>
>>>
>>> Then, I tried like this, but I cannot.
>>>
>>>
>>> $ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse
>>> ract-ocr/4.00/tessdata
>>>
>>> Error opening data file
>>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>>
>>> Please make sure the TESSDATA_PREFIX environment variable is set to your
>>> "tessdata" directory.
>>>
>>> Failed loading language 'chi_sim'
>>>
>>> Tesseract couldn't load any languages!
>>>
>>> Could not initialize tesseract.
>>>
>>>
>>> Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata
>>> and tried again, but I cannot.
>>>
>>>
>>> $ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/
>>>
>>> $ tesseract 0.jpeg output -l chi_sim
>>>
>>> Error opening data file
>>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>>
>>> Please make sure the TESSDATA_PREFIX environment variable is set to your
>>> "tessdata" directory.
>>>
>>> Failed loading language 'chi_sim'
>>>
>>> Tesseract couldn't load any languages!
>>>
>>> Could not initialize tesseract.
>>>
>>>
>>> If I execute the language list, I can find chi_sim.traineddata again.
>>>
>>> $ tesseract --list-langs
>>>
>>> List of available languages (5):
>>>
>>> chi_sim
>>>
>>> chi_tra
>>>
>>> eng
>>>
>>> jpn
>>>
>>> osd
>>>
>>>
>>> Please tell me why I cannot use the traineddata downloaded from
>>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
>>> ?
>>> Did I make a mistake?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/fd0e48ec-412c-464d-85bb-5ed65d4419c3%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe 

[tesseract-ocr] Re: tess-two with tessdata_fast crashes

2019-12-08 Thread Quan Nguyen
There are new OcrEngineMode 

 
values.


On Saturday, December 7, 2019 at 7:37:49 PM UTC-6, NY C wrote:
>
> Hi, I am using tess-two for OCR.
>
>
> (Alex Chon version : https://github.com/alexcohn/tess-two 
> 
> )
>
>
> Code:
>
> TessBaseAPI baseApi = new TessBaseAPI();
> baseApi.setDebug(true);
> baseApi.init(pathfiles, language);
> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
> baseApi.setImage(bmp);
> result= baseApi.getUTF8Text();
> baseApi.end();
>
>
> The code run perfectly when I use this tessdata :
> https://github.com/tesseract-ocr/tessdata
>
> But when I use tessdata_fast (
> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on 
> baseApi.init.
>
>
> There is no error message since the init method calls native C++. As far 
> as I can trace, the init method crashes on this line:
>
> boolean success = nativeInitOem(mNativeData, datapath, language, 
> ocrEngineMode);
>
>
> I also tried to set the OEM like this: 
>
>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>
>
> All the OEM parameters have been tried :
>
> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 
> 2, OEM_DEFAULT = 3) 
>
> Crashes as well.
>
>
> How could I fix this?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/da278c1a-5e04-4237-a1f6-10100dc54796%40googlegroups.com.


Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-08 Thread NY C
Also, I think CUBE is removed from tesseract 4x.
I found it very strange that there is no suitable OEM value in tess-two 
9.0.0.

Could somebody help me here. Do I miss anything to make tessdata_fast work 
in tess-two?



NY C於 2019年12月7日星期六 UTC+8下午5時37分59秒寫道:
>
> I  changed the the oem to this as you said :
> baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
> but it still crashes.
>
> I tried all the parameters I can find
> (OEM_TESSERACT_ONLY = 0,  OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 
> 2, OEM_DEFAULT = 3)
> They crashes on the same line.
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a0e3deb9-81c7-4bc4-b650-45a3538cafee%40googlegroups.com.


Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-08 Thread NY C
Also, I think CUBE is removed from tesseract 4x.
I found it strange to have this CUBE OEM value in tess-two 9.0.0.

Could somebody help me here. Do I miss anything to make tessdata_fast work 
in tess-two?


NY C於 2019年12月7日星期六 UTC+8下午5時37分59秒寫道:
>
> I  changed the the oem to this as you said :
> baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
> but it still crashes.
>
> I tried all the parameters I can find
> (OEM_TESSERACT_ONLY = 0,  OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 
> 2, OEM_DEFAULT = 3)
> They crashes on the same line.
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/05ed1ee1-7579-4c63-9823-1cd2f116f44b%40googlegroups.com.


Re: [tesseract-ocr] tess-two with tessdata_fast crashes

2019-12-08 Thread NY C
I have tried all those 4 OEM values.

NY C於 2019年12月8日星期日 UTC+8下午11時37分37秒寫道:
>
> I see.
>
> However there are only 4 OEM parameters I can find in tess-two sorce code 
> :
>
> @IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY, 
> OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
> public @interface OcrEngineMode {}
>
> /** Run Tesseract only - fastest */
> public static final int OEM_TESSERACT_ONLY = 0;
>
> /** Run Cube only - better accuracy, but slower */
> @Deprecated
> public static final int OEM_CUBE_ONLY = 1;
>
> /** Run both and combine results - best accuracy */
> @Deprecated
> public static final int OEM_TESSERACT_CUBE_COMBINED = 2;
>
> /** Default OCR engine mode. */
> public static final int OEM_DEFAULT = 3;
>
> I sincerely can not find a suitable OEM parameter. I don't think there is 
> any other OEM parameter in tess-two.
> (Again, the version I use is https://github.com/alexcohn/tess-two 
> 
> ,  9.0.0)
>
> Could you please give me some more tips.
>
>
>
> zdenop於 2019年12月8日星期日 UTC+8下午10時17分12秒寫道:
>>
>> If you want to use API you need to spend some time with docs and source 
>> code.
>> You could fine out quite quickly that  CUBE  was removed from tesseract 
>> and is not available in version 4.
>>  
>> Zdenko
>>
>>
>> ne 8. 12. 2019 o 2:37 NY C  napísal(a):
>>
>>> Hi, I am using tess-two for OCR.
>>>
>>>
>>> (Alex Chon version : https://github.com/alexcohn/tess-two 
>>> 
>>> )
>>>
>>>
>>> Code:
>>>
>>> TessBaseAPI baseApi = new TessBaseAPI();
>>> baseApi.setDebug(true);
>>> baseApi.init(pathfiles, language);
>>> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
>>> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
>>> baseApi.setImage(bmp);
>>> result= baseApi.getUTF8Text();
>>> baseApi.end();
>>>
>>>
>>> The code run perfectly when I use this tessdata :
>>> https://github.com/tesseract-ocr/tessdata
>>>
>>> But when I use tessdata_fast (
>>> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on 
>>> baseApi.init.
>>>
>>>
>>> There is no error message since the init method calls native C++. As far 
>>> as I can trace, the init method crashes on this line:
>>>
>>> boolean success = nativeInitOem(mNativeData, datapath, language, 
>>> ocrEngineMode);
>>>
>>>
>>> I also tried to set the OEM like this: 
>>>
>>>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>>>
>>>
>>> All the OEM parameters have been tried :
>>>
>>> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED 
>>> = 2, OEM_DEFAULT = 3) 
>>>
>>> Crashes as well.
>>>
>>>
>>> How could I fix this?
>>>
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/189fd3e5-4894-4a60-a6b3-480093d6f8ad%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/706339b1-d7a1-46c1-b5a9-a635d394794e%40googlegroups.com.


Re: [tesseract-ocr] tess-two with tessdata_fast crashes

2019-12-08 Thread NY C
I see.

However there are only 4 OEM parameters I can find in tess-two sorce code :

@IntDef({OEM_TESSERACT_ONLY, OEM_CUBE_ONLY, 
OEM_TESSERACT_CUBE_COMBINED, OEM_DEFAULT})
public @interface OcrEngineMode {}

/** Run Tesseract only - fastest */
public static final int OEM_TESSERACT_ONLY = 0;

/** Run Cube only - better accuracy, but slower */
@Deprecated
public static final int OEM_CUBE_ONLY = 1;

/** Run both and combine results - best accuracy */
@Deprecated
public static final int OEM_TESSERACT_CUBE_COMBINED = 2;

/** Default OCR engine mode. */
public static final int OEM_DEFAULT = 3;

I sincerely can not find a suitable OEM parameter. I don't think there is 
any other OEM parameter in tess-two.
(Again, the version I use is https://github.com/alexcohn/tess-two 

,  9.0.0)

Could you please give me some more tips.



zdenop於 2019年12月8日星期日 UTC+8下午10時17分12秒寫道:
>
> If you want to use API you need to spend some time with docs and source 
> code.
> You could fine out quite quickly that  CUBE  was removed from tesseract 
> and is not available in version 4.
>  
> Zdenko
>
>
> ne 8. 12. 2019 o 2:37 NY C > 
> napísal(a):
>
>> Hi, I am using tess-two for OCR.
>>
>>
>> (Alex Chon version : https://github.com/alexcohn/tess-two 
>> 
>> )
>>
>>
>> Code:
>>
>> TessBaseAPI baseApi = new TessBaseAPI();
>> baseApi.setDebug(true);
>> baseApi.init(pathfiles, language);
>> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
>> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
>> baseApi.setImage(bmp);
>> result= baseApi.getUTF8Text();
>> baseApi.end();
>>
>>
>> The code run perfectly when I use this tessdata :
>> https://github.com/tesseract-ocr/tessdata
>>
>> But when I use tessdata_fast (
>> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on 
>> baseApi.init.
>>
>>
>> There is no error message since the init method calls native C++. As far 
>> as I can trace, the init method crashes on this line:
>>
>> boolean success = nativeInitOem(mNativeData, datapath, language, 
>> ocrEngineMode);
>>
>>
>> I also tried to set the OEM like this: 
>>
>>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>>
>>
>> All the OEM parameters have been tried :
>>
>> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 
>> 2, OEM_DEFAULT = 3) 
>>
>> Crashes as well.
>>
>>
>> How could I fix this?
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/189fd3e5-4894-4a60-a6b3-480093d6f8ad%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/44e444ba-2322-4df8-921b-4850498b0ee2%40googlegroups.com.


Re: [tesseract-ocr] I cannot use traineddata downloaded from Data Files

2019-12-08 Thread 坂本聖
Thanks for your advice.
I downdloaded files by clicking the "download" button in 
https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata.
And I moved the chi_sim.traineddata file 
to  /usr/share/tesseract-ocr/4.00/tessdata/ , and checked the file (which 
size is 42.3MB)  exactly there.
But, I cannot use tesseract.
As I said, I can use tesseract with the file downloaded by executing sudo 
apt install tesseract-ocr-chi-sim, but the data downloaded from Data files 
did not work.
I cannot understand why it did not work.

2019年12月8日日曜日 23時15分31秒 UTC+9 zdenop:
>
> How did you downloaded files from repository?
> Please check files in  /usr/share/tesseract-ocr/4.00/tessdata/ if there 
> have the same size as in repository.
>
> Zdenko
>
>
> so 7. 12. 2019 o 17:34 坂本聖 > 
> napísal(a):
>
>> Hi,
>> I want to use tesseract for Chinese words. So, first I tried to execute 
>> the command 
>> sudo apt install tesseract-ocr-chi-sim 
>> And, I can find chi_sim.traineddata in 
>> /usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also 
>> downloaded chi_tra and jpn.)
>>
>> $ tesseract --list-langs
>>
>> List of available languages (5):
>>
>> chi_sim
>>
>> chi_tra
>>
>> eng
>>
>> jpn
>>
>> osd
>>
>>
>> Actually, I can use tesseract, but I want to do ocr more accurately, so I 
>> want to use chi_sim.traineddata downloaded from here.
>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
>> After I executed the command
>> sudo apt remove tesseract-ocr-chi-sim
>> I put the new chi_sim.traineddata in 
>> /usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract. 
>> However I cannot like this.
>>
>> $ tesseract 0.jpeg output -l chi_sim
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>> Failed loading language 'chi_sim'
>>
>> Tesseract couldn't load any languages!
>>
>> Could not initialize tesseract.
>>
>>
>> Then, I tried like this, but I cannot.
>>
>>
>> $ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse
>> ract-ocr/4.00/tessdata
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>> Failed loading language 'chi_sim'
>>
>> Tesseract couldn't load any languages!
>>
>> Could not initialize tesseract.
>>
>>
>> Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata 
>> and tried again, but I cannot.
>>
>>
>> $ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/
>>
>> $ tesseract 0.jpeg output -l chi_sim
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>> Failed loading language 'chi_sim'
>>
>> Tesseract couldn't load any languages!
>>
>> Could not initialize tesseract.
>>
>>
>> If I execute the language list, I can find chi_sim.traineddata again.
>>
>> $ tesseract --list-langs
>>
>> List of available languages (5):
>>
>> chi_sim
>>
>> chi_tra
>>
>> eng
>>
>> jpn
>>
>> osd
>>
>>
>> Please tell me why I cannot use the traineddata downloaded from 
>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata 
>> ?
>>  
>> Did I make a mistake?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fd0e48ec-412c-464d-85bb-5ed65d4419c3%40googlegroups.com.


[tesseract-ocr] Re: I cannot use traineddata downloaded from Data Files

2019-12-08 Thread 坂本聖
Thanks for your advice, however I am using ubuntu on wsl (windows subsystem 
for linux), and I have already tried to set TESSDATA_PEREFIX by executing $ 
export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/  .
But, I cannot use tesseract.
If I use the traineddata downloaded from sudo apt install 
tesseract-ocr-chi-sim, I can use tesseract with the data downloaded from 
data Data files.
Cannot I use tesseract on wsl (Ubuntu)? 

2019年12月8日日曜日 11時06分58秒 UTC+9 NY C:
>
> Try to set TESSDATA_PREFIX environment variable.
>
>1. Go to Control Panel -> System -> Advanced System Settings -> 
>Advanced tab -> *Environment Variables...* button
>2. In System variables window scroll down to *TESSDATA_PREFIX*. If 
>it's not right, select and click *Edit...*
>
>
>
> 坂本聖於 2019年12月8日星期日 UTC+8上午12時34分26秒寫道:
>>
>> Hi,
>> I want to use tesseract for Chinese words. So, first I tried to execute 
>> the command 
>> sudo apt install tesseract-ocr-chi-sim 
>> And, I can find chi_sim.traineddata in 
>> /usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also 
>> downloaded chi_tra and jpn.)
>>
>> $ tesseract --list-langs
>>
>> List of available languages (5):
>>
>> chi_sim
>>
>> chi_tra
>>
>> eng
>>
>> jpn
>>
>> osd
>>
>>
>> Actually, I can use tesseract, but I want to do ocr more accurately, so I 
>> want to use chi_sim.traineddata downloaded from here.
>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
>> After I executed the command
>> sudo apt remove tesseract-ocr-chi-sim
>> I put the new chi_sim.traineddata in 
>> /usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract. 
>> However I cannot like this.
>>
>> $ tesseract 0.jpeg output -l chi_sim
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>> Failed loading language 'chi_sim'
>>
>> Tesseract couldn't load any languages!
>>
>> Could not initialize tesseract.
>>
>>
>> Then, I tried like this, but I cannot.
>>
>>
>> $ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse
>> ract-ocr/4.00/tessdata
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>> Failed loading language 'chi_sim'
>>
>> Tesseract couldn't load any languages!
>>
>> Could not initialize tesseract.
>>
>>
>> Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata 
>> and tried again, but I cannot.
>>
>>
>> $ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/
>>
>> $ tesseract 0.jpeg output -l chi_sim
>>
>> Error opening data file 
>> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
>>
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>>
>> Failed loading language 'chi_sim'
>>
>> Tesseract couldn't load any languages!
>>
>> Could not initialize tesseract.
>>
>>
>> If I execute the language list, I can find chi_sim.traineddata again.
>>
>> $ tesseract --list-langs
>>
>> List of available languages (5):
>>
>> chi_sim
>>
>> chi_tra
>>
>> eng
>>
>> jpn
>>
>> osd
>>
>>
>> Please tell me why I cannot use the traineddata downloaded from 
>> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata? 
>> Did I make a mistake?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/06b3b21c-130a-416e-b32b-c95557d8a156%40googlegroups.com.


Re: [tesseract-ocr] tess-two with tessdata_fast crashes

2019-12-08 Thread Zdenko Podobny
If you want to use API you need to spend some time with docs and source
code.
You could fine out quite quickly that  CUBE  was removed from tesseract and
is not available in version 4.

Zdenko


ne 8. 12. 2019 o 2:37 NY C  napísal(a):

> Hi, I am using tess-two for OCR.
>
>
> (Alex Chon version : https://github.com/alexcohn/tess-two
> 
> )
>
>
> Code:
>
> TessBaseAPI baseApi = new TessBaseAPI();
> baseApi.setDebug(true);
> baseApi.init(pathfiles, language);
> //baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789");
> baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
> baseApi.setImage(bmp);
> result= baseApi.getUTF8Text();
> baseApi.end();
>
>
> The code run perfectly when I use this tessdata :
> https://github.com/tesseract-ocr/tessdata
>
> But when I use tessdata_fast (
> https://github.com/tesseract-ocr/tessdata_fast), The code crashes on
> baseApi.init.
>
>
> There is no error message since the init method calls native C++. As far
> as I can trace, the init method crashes on this line:
>
> boolean success = nativeInitOem(mNativeData, datapath, language, 
> ocrEngineMode);
>
>
> I also tried to set the OEM like this:
>
>   baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY);
>
>
> All the OEM parameters have been tried :
>
> (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED =
> 2, OEM_DEFAULT = 3)
>
> Crashes as well.
>
>
> How could I fix this?
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/189fd3e5-4894-4a60-a6b3-480093d6f8ad%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x-GnsxfmXzn5fGjLURgUq4wVUpBXUvE4ZYk1xFNysE6Q%40mail.gmail.com.