[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446711#comment-16446711 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/21/18 9:15 AM: -- that's just a warm-up and to get rid of (some) binaries in the patch. was (Author: tilman): that's just a warm-up and to get rid of binaries in the patch. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446685#comment-16446685 ] Palash Ray edited comment on PDFBOX-4189 at 4/21/18 8:28 AM: - I know. If you ask me, its a real shame. The reason we have abstractions and specifications, we are supposed to be able to figure out pretty much, all the rules, without having to write language specific handlers. But I think even the font developers are to blame. They should push these big companies who build these specifications to do a better job. Anyway, sorry for the rant :) was (Author: paawak): I know. If you ask me, its a real shame. The reason we have abstractions and specifications, we are supposed to be able to figure out pretty much the rules, without having to write language specific handlers. But I think even the font developers are to blame. They should push these big companies who build these specifications to do a better job. Anyway, sorry for the rant :) > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:45 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code we have for forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be the list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? was (Author: jahewson): {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code we have for forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:44 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code we have for forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? was (Author: jahewson): {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:41 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. Maybe layout() should be called shapeText() to emphasize this distinction? was (Author: jahewson): {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914 ] John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:40 AM: -- {quote} For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote} It's probably worth noting that BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. was (Author: jahewson): For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt bq. here BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438609#comment-16438609 ] Maruan Sahyoun edited comment on PDFBOX-4189 at 4/15/18 8:34 AM: - The patch is a great and - given several questions we had in the past - important addition to PDFBox. On the longer run I'd see some additions we might conceptually already think about and/or start introducing in the public API. As I haven't reviewed the patch the below list is meant to be a hint for possible addition. They may already be included For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt To allow the user to override the language system identified by the script being used we might want to add {{setLanguage/getLanguage}} so that can be called prior to {{showText}} if an override needs to be done. Putting that into an internal {{layout}} method as John suggested would also allow us to put it behind a feature flag where one could enable/disable the processing. We might also mark that feature as **experimental** and specify which languages it has been tested with (to some extend). This is mainly meant to understand which capabilities belong where as I'm looking to add the processing to layout of interactive form field values. was (Author: msahyoun): The patch is a great and - given several questions we had in the past - important addition to PDFBox. On the longer run I'd see some additions we might conceptually already think about and/or start introducing in the public API. As I haven't reviewed the patch the below list is meant to be a hint for possible addition. They may already be included For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt To allow the user to override the language system identified by the script being used we might want to add {{setLanguage/getLanguage}} so that can be called prior to {{showText}} if an override needs to be done. Putting that into an internal {{layout}} method as John suggested would also allow us to put it behind a feature flag where one could enable/disable the processing. We might also mark that feature as **experimental** and specify which languages it has been tested with (to some extend). > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 1:04 AM: -- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{final PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText instead of encode(text). I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Because this patch is just for GSUB, all of those positioning values can just be zero, and we need not implemented any actual glyph positioning in showText() yet :). Thus GlyphVector will serve simply as an array of GIDs. Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. P.S. Make sure any new APIs are {{final}}. All of the suggestions above consist of adding only non-breaking APIs, which is nice. Thanks! was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(Stri
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:57 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{final PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText instead of encode(text). I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Because this patch is just for GSUB, all of those positioning values can just be zero, and we need not implemented any actual glyph positioning in showText() yet :). Thus GlyphVector will serve simply as an array of GIDs. Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. P.S. Make sure any new APIs are {{final}}. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _sub
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:52 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText instead of encode(text). I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep tra
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:51 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText. I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetti
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:49 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too. Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText. I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:48 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (but we can relax this a bit, as I explain below). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (and so have no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too. Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText. I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We current
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:47 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. We also need to keep track of glyphs for subsetting, which is not possible in encode(). *Subsetting*: We currently track which glyphs need to be included in a subset by using their Unicode code point, but with GSUB enabled we will have to keep track of some substituted glyphs via their glyph id (GID), because the glyphs which result from a substitution don't necessarily have their own code points (and so have no entry in the camp table). This should be easy to add to TTFSubsetter as it already tracks glyph ids internally, we just need the ability to pass them in too. Then PDPageContentStream#showText will be responsible for passing the glyph ids. But now we need showText to know about those glyph ids, which leads me to *Glyph IDs:* The JDK represents text which has been through OpenType layout as a [GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html] which encapsulates substitutions via GID and positioning via a transform associated with each glyph. PDFBox might want to do something similar, I think it would even be ok to add this to PDType0Font (because I'm suggesting a specific OpenType API so it doesn't interfere with our PDType0Font's non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector layout(String text)}} which is called from PDPageContentStream#showText. I also think it would be fine to use instanceof to detect this case, because only PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText needs to know how to do is to draw a PDFGlyphVector on the page, by converting it into the equivalent text drawing operations (Tj and the like). Phew! That was a lot of information. Just to be clear, the current patch is not compatible with subsetting without making some changes. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *Technical Background*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > --
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *Technical Background*: In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *Technical Background*: In general, OpenType layouts consist of glyph _substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) *Technical Background*: In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:16 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle positionings in PDFont#encode(), so that helps explain why showText() is the right place for OpenType, as showText performs both positioning and encoding. was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540 ] John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:11 AM: --- Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType. So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) was (Author: jahewson): Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (by design). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org