Re: [tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Zdenko Podobny
Use tsv output but you will still need to parse it to get line information.

Zdenko


st 21. 4. 2021 o 16:38 Baris Unsal  napĂ­sal(a):

> I want the opposite way. Getting ril_textline like output from passing
> argument to tesseract.
>
> On Wednesday, 21 April 2021 at 17:36:35 UTC+3 Quan Nguyen wrote:
>
>> I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.
>>
>> On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com
>> wrote:
>>
>>> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it
>>> outputs individual chars' location. But when I use api like this:
>>>
>>> ```
>>> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL
>>> ,NULL);
>>> for(int i = 0; i < boxes->n; i++){
>>> BOX* box =boxaGetBox(boxes,i,L_CLONE);
>>> api->SetRectangle(box->x,box->y,box->w,box->h);
>>> char* outText = api->GetUTF8Text();
>>> int conf = api->MeanTextConf();
>>> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text:
>>> %s",
>>> i, box->x, box->y, box->w, box->h, conf, outText);
>>> boxDestroy();
>>> delete[] outText;
>>> }
>>> ```
>>> it outputs whole line like this:
>>> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset
>>> Fanket FIliskiler
>>>
>>> Is there any way to combine individual boxes to print like API? Thanks
>>> in advance.
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>> ### Environment
>>>
>>> * **Tesseract Version**: 
>>> tesseract 4.1.1-rc2-25-g9707
>>>  leptonica-1.78.0
>>>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 :
>>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>  Found AVX2
>>>  Found AVX
>>>  Found FMA
>>>  Found SSE
>>>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>>> liblz4/1.8.3 libzstd/1.3.8
>>>
>>> * **Platform**: 
>>> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28)
>>> x86_64 GNU/Linux
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8e54bc79-113a-4685-9bba-2353216dad2fn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wS8XdwKW1eG%2BBW2L2ieVMYt%2B4GjAP59tyf%2BQpcWVOkwA%40mail.gmail.com.


[tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Baris Unsal
I want the opposite way. Getting ril_textline like output from passing 
argument to tesseract.

On Wednesday, 21 April 2021 at 17:36:35 UTC+3 Quan Nguyen wrote:

> I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.
>
> On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com 
> wrote:
>
>> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it 
>> outputs individual chars' location. But when I use api like this:
>>
>> ```
>> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
>> NULL);
>> for(int i = 0; i < boxes->n; i++){
>> BOX* box =boxaGetBox(boxes,i,L_CLONE);
>> api->SetRectangle(box->x,box->y,box->w,box->h);
>> char* outText = api->GetUTF8Text();
>> int conf = api->MeanTextConf();
>> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: 
>> %s",
>> i, box->x, box->y, box->w, box->h, conf, outText);
>> boxDestroy();
>> delete[] outText;
>> }
>> ```
>> it outputs whole line like this:
>> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset 
>> Fanket FIliskiler
>>
>> Is there any way to combine individual boxes to print like API? Thanks in 
>> advance.
>>
>>
>>
>>
>>
>>
>> 
>> ### Environment
>>
>> * **Tesseract Version**: 
>> tesseract 4.1.1-rc2-25-g9707
>>  leptonica-1.78.0
>>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : 
>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>  Found AVX2
>>  Found AVX
>>  Found FMA
>>  Found SSE
>>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 
>> liblz4/1.8.3 libzstd/1.3.8
>>
>> * **Platform**: 
>> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
>> GNU/Linux
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8e54bc79-113a-4685-9bba-2353216dad2fn%40googlegroups.com.


[tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Quan Nguyen
I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.

On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com wrote:

> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it outputs 
> individual chars' location. But when I use api like this:
>
> ```
> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
> NULL);
> for(int i = 0; i < boxes->n; i++){
> BOX* box =boxaGetBox(boxes,i,L_CLONE);
> api->SetRectangle(box->x,box->y,box->w,box->h);
> char* outText = api->GetUTF8Text();
> int conf = api->MeanTextConf();
> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s"
> ,
> i, box->x, box->y, box->w, box->h, conf, outText);
> boxDestroy();
> delete[] outText;
> }
> ```
> it outputs whole line like this:
> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset 
> Fanket FIliskiler
>
> Is there any way to combine individual boxes to print like API? Thanks in 
> advance.
>
>
>
>
>
>
> 
> ### Environment
>
> * **Tesseract Version**: 
> tesseract 4.1.1-rc2-25-g9707
>  leptonica-1.78.0
>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : 
> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found FMA
>  Found SSE
>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 
> liblz4/1.8.3 libzstd/1.3.8
>
> * **Platform**: 
> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
> GNU/Linux
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a20ef4b7-9f76-4f20-a867-5d6f60fc6c62n%40googlegroups.com.