You make a good point, zdenko.
If there are limitations on training data to be used or minimum memory
requirements for handling such data for doing custom training, it will be
good to document them in the wiki, so that people do not waste time and
effort in training if they don't have the minimum
Yes, you can ;-)
If you want to document it, you need to find reason for error.
If you want to find reason you need to dive in 130Gb of input data...
Enjoy.
IMO right suggestion is to ask user to find file/data that cause problem
and create minimal input data that demonstrate problem. Creating iss
In my opinion, the assert still needs to be documented as an issue, with
LSTM training.
On Tue, 27 Nov 2018, 05:03 Zdenko Podobny Shree,
>
> issue tracker is not for custom training. Simply because there is not
> enough people and
> it can not be reproduced...
> Did you read: "I have been runnig
Hi,
Can you tell me how did you extract the binary-intermediate image
created by Tesseract?
On Saturday, June 18, 2016 at 10:16:57 PM UTC+5:30, Julian Einhaus wrote:
>
> Hi,
> I am trying to read three lines of text on a well defined image (pretty
> much no background noise, characters sepera
Shree,
issue tracker is not for custom training. Simply because there is not
enough people and
it can not be reproduced...
Did you read: "I have been runnig about 130G data which are 4000 files"?
Unless you are not able to reproduce problem with very small data, there is
IMO nobody would be willi
I don't think that would be the case unless your training text is few hundred
megabytes in size...
I am running Tesseract on Ubuntu 18.04 and based a very quick test it turned
out Tesseract on Ubuntu performed better than on Windows in terms of agreement
accuracy (I'm training it for handwritin
Hi Junye Li,
I hava an workstation with 36 core(2.0Ghz) and 24G Memory ,RHEL system
I'm now running text2image to generate tif/box ,I guess it still needs
to be executed for a week.
Next,I will run tesseract to generate .lstm files , I guess it will
take about two weeks.
Finally,
c此前我在使用hocr输出,也遇到这种情况
w我的解决方案是,重新编译tesseract
我想可能因为之前使用了best模型bbox模型替换了原有的,导致不兼容,因此后面在对模型进行替换的时候,先备份最开始的编译的数据
b不知道你的情况是否与之类似。
Hwa Chuang 于2018年11月27日周二 上午3:26写道:
> I was using Tesseract v4 to generate PDF file and found some of string
> can't be searched because of missing characters in PDF file
8 matches
Mail list logo