Would you recommend that I use another OCR distribution to retain 
formatting information? I've been considering Kraken, Calamari, Google 
Vision API, Amazon Rekognition or OCR4all (which was developed for early 
print).

Le vendredi 27 décembre 2019 12:10:34 UTC-5, shree a écrit :
>
> Formatting info is not retained in tesseract4. It was available in 3.0x
>
> On Fri, Dec 27, 2019, 22:29 Scott M. Sanders <[email protected] <javascript:>> 
> wrote:
>
>> I added the following code, which has improved the results. I thought 
>> that adding 'alto' would create an xml file with formatting information, 
>> but it didn't work. Is there another way to retain formatting information 
>> in Tesseract?
>>
>> config = ("-l fra --oem 1 --psm 1 alto")text = 
>> pytesseract.image_to_string(Image.open('readonly/greyscale_noise.jpg'),config=
>>  config) 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/f2b7bbf7-6f43-4598-8e24-7afac9e1fc38%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/f2b7bbf7-6f43-4598-8e24-7afac9e1fc38%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3143e05a-5388-44fe-912a-1c8d65040c29%40googlegroups.com.

Reply via email to