Re: Reg: Issue in rendering South Indian language (Telugu)

Tilman Hausherr Tue, 20 Jun 2023 20:00:46 -0700

Hi,

This is a known weakness. There is an implementation for Bengali, butnot for other Indian languages

https://issues.apache.org/jira/browse/PDFBOX-4189
this is only for 3.0, and doesn't do text extraction properly.

That code might be expanded for Telugu if it uses the same concepts fromthe GSUB table.


If you, or anyone, is interested in this:
- get the source code
- look at GsubWorkerForBengali

- look athttps://learn.microsoft.com/en-us/typography/script-development/bengali

and compare with
https://learn.microsoft.com/en-us/typography/script-development/telugu
to see what might have to be done for a new GsubWorkerForTelugu

- possibly (not sure if needed) implement the TODOs inGlyphSubstitutionTable and in GlyphSubstitutionDataExtractor

- possible (not sure if needed) implement GPOS handling

Tilman

On 21.06.2023 03:26, Ravi, Swetha wrote:

Hi Apache Pdfbox team,
I am woking with Mediaconvert team in AWS elemental. We use ttt:ttpetool for rendering captions in ttml onto the video file. We foundissues when rendering the few words in Telugu language using pdfboxtool. For example, the word వాక్యూమ్, which is in Telugu language, isnot rendered properly. I have attached the rendering as pdf file andthe input as image file with this email. To be specific the word is`vacuum` and rendering of half y sound in the language is missing inthe image. So I suspect half consonant rendering is an issue. I triedusing the latest version of pdfbox to create a pdf for this text(output is attached).
Could you please take a look at this issue and let me know if we haveany workaround, or if we can have a fix for this issue in the near future?
Thank you,

Swetha Ravi

Software Development Engineer

AWS Elemental Mediaconvert


---------------------------------------------------------------------
To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:users-h...@pdfbox.apache.org

Re: Reg: Issue in rendering South Indian language (Telugu)

Reply via email to