subject:"\[jira\] \[Comment Edited\] \(PDFBOX\-4189\) Enable rendering of Indian languages, by reading and utilizing the GSUB table"

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446711#comment-16446711
 ] 

Tilman Hausherr edited comment on PDFBOX-4189 at 4/21/18 9:15 AM:
--

that's just a warm-up and to get rid of (some) binaries in the patch.


was (Author: tilman):
that's just a warm-up and to get rid of binaries in the patch.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-21 Thread Palash Ray (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446685#comment-16446685
 ] 

Palash Ray edited comment on PDFBOX-4189 at 4/21/18 8:28 AM:
-

I know. If you ask me, its a real shame. The reason we have abstractions and 
specifications, we are supposed to be able to figure out pretty much, all the 
rules, without having to write language specific handlers. But I think even the 
font developers are to blame. They should push these big companies who build 
these specifications to do a better job. Anyway, sorry for the rant :)


was (Author: paawak):
I know. If you ask me, its a real shame. The reason we have abstractions and 
specifications, we are supposed to be able to figure out pretty much the rules, 
without having to write language specific handlers. But I think even the font 
developers are to blame. They should push these big companies who build these 
specifications to do a better job. Anyway, sorry for the rant :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:45 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code we have for forms, for example). So in layout 
we're really only going to be concerned with GPOS and GSUB features. That way 
the only options that one might want to pass to layout would be the list of 
which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?


was (Author: jahewson):
{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code we have for forms, for example). So in layout 
we're really only going to be concerned with GPOS and GSUB features. That way 
the only options that one might want to pass to layout would be this list of 
which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:44 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code we have for forms, for example). So in layout 
we're really only going to be concerned with GPOS and GSUB features. That way 
the only options that one might want to pass to layout would be this list of 
which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?


was (Author: jahewson):
{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:41 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

Maybe layout() should be called shapeText() to emphasize this distinction?


was (Author: jahewson):
{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438914#comment-16438914
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/16/18 1:40 AM:
--

{quote}
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: 
https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt{quote}

It's probably worth noting that BASE, JSTF and BiDi are concerned with 
_paragraph-level_ layout, which happens at a higher level than the proposed 
layout() - which would be concerned with only a single script in a single 
direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes 
between different scripts, while JSTF is to aid in making good line break 
choices. So all of that functionality will happen somewhere else (this fits 
very closely with the layout code form forms, for example). So in layout we're 
really only going to be concerned with GPOS and GSUB features. That way the 
only options that one might want to pass to layout would be this list of which 
[feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.


was (Author: jahewson):
For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

bq. here

BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens 
at a higher level than the proposed layout() - which would be concerned with 
only a single script in a single direction (i.e. only OpenType _shaping_). BASE 
and BiDi are related to changes between different scripts, while JSTF is to aid 
in making good line break choices. So all of that functionality will happen 
somewhere else (this fits very closely with the layout code form forms, for 
example). So in layout we're really only going to be concerned with GPOS and 
GSUB features. That way the only options that one might want to pass to layout 
would be this list of which [feature 
flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to 
apply.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-15 Thread Maruan Sahyoun (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438609#comment-16438609
 ] 

Maruan Sahyoun edited comment on PDFBOX-4189 at 4/15/18 8:34 AM:
-

The patch is a great and - given several questions we had in the past - 
important addition to PDFBox.

On the longer run I'd see some additions we might conceptually already think 
about and/or start introducing in the public API. As I haven't reviewed the 
patch the below list is meant to be a hint for possible addition. They may 
already be included

For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

To allow the user to override the language system identified by the script 
being used we might want to add {{setLanguage/getLanguage}} so that can be 
called prior to {{showText}} if an override needs to be done.

Putting that into an internal {{layout}} method as John suggested would also 
allow us to put it behind a feature flag where one could enable/disable the 
processing. We might also mark that feature as **experimental** and specify 
which languages it has been tested with (to some extend).

This is mainly meant to understand which capabilities belong where as I'm 
looking to add the processing to layout of interactive form field values.


was (Author: msahyoun):
The patch is a great and - given several questions we had in the past - 
important addition to PDFBox.

On the longer run I'd see some additions we might conceptually already think 
about and/or start introducing in the public API. As I haven't reviewed the 
patch the below list is meant to be a hint for possible addition. They may 
already be included

For correct text positioning using mixed language information from the 
following tables might be useful:
- GPOS: to adjust the glyph position
- BASE: baseline offsets on a script-by-script basis.
- JSTF: justification information, including whitespace and Kashida adjustments.
- BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt

To allow the user to override the language system identified by the script 
being used we might want to add {{setLanguage/getLanguage}} so that can be 
called prior to {{showText}} if an override needs to be done.

Putting that into an internal {{layout}} method as John suggested would also 
allow us to put it behind a feature flag where one could enable/disable the 
processing. We might also mark that feature as **experimental** and specify 
which languages it has been tested with (to some extend).

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 1:04 AM:
--

Hi guys, this is a really welcome contribution, thank you. With regards to
PDFont#encode(String text) being non-final I can add some insight as I was the
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to
how they are represented when embedded in PDF files. So there's no support for
OpenType, by design. A Type0 font knows nothing about OpenType (but we can
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of
abstraction up, during text _layout_ instead of text _encoding_*_._* So you
want to put your glyph substitution code inside PDPageContentStream#showText,
actually you want
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle
positionings in PDFont#encode(), so that helps explain why showText() is the
right place for OpenType, as showText performs both positioning and encoding.
We also need to keep track of glyphs for subsetting, which is not possible in
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(no entry in the camp table). This should be easy to add to TTFSubsetter as it
already tracks glyph ids internally, we just need the ability to pass them in
too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be
responsible for passing the glyph ids. But now we need showText to know about
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{final
PDFGlyphVector layout(String text)}} which is called from
PDPageContentStream#showText instead of encode(text). I also think it would be
fine to use instanceof to detect this case, because only PDType0Font need have
this capability. I'm assuming PDFGlyphVector is our own very simple version of
the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy)
tuples. Then all that PDPageContentStream#showText needs to know how to do is
to draw a PDFGlyphVector on the page, by converting it into the equivalent text
drawing operations (Tj and the like). Because this patch is just for GSUB, all
of those positioning values can just be zero, and we need not implemented any
actual glyph positioning in showText() yet :). Thus GlyphVector will serve
simply as an array of GIDs.

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes. P.S. Make sure any new
APIs are {{final}}. All of the suggestions above consist of adding only
non-breaking APIs, which is nice.

Thanks!

was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to
PDFont#encode(String text) being non-final I can add some insight as I was the
original designer of our current PDFont#encode mechanism.

That way PDFont#encode(Stri

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:57 AM:
---

That way PDFont#encode(String text) can stay non-final :)

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes. P.S. Make sure any new
APIs are {{final}}.

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _sub

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:52 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText instead
of encode(text). I also think it would be fine to use instanceof to detect this
case, because only PDType0Font need have this capability. I'm assuming
PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which
is effectively just a vector of (gid, dx, dy) tuples. Then all that
PDPageContentStream#showText needs to know how to do is to draw a
PDFGlyphVector on the page, by converting it into the equivalent text drawing
operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:51 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:49 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(no entry in the camp table). This should be easy to add to TTFSubsetter as it
already tracks glyph ids internally, we just need the ability to pass them in
too. Then PDPageContentStream#showText will be responsible for passing the
glyph ids. But now we need showText to know about those glyph ids, which leads
me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:48 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(and so have no entry in the camp table). This should be easy to add to
TTFSubsetter as it already tracks glyph ids internally, we just need the
ability to pass them in too. Then PDPageContentStream#showText will be
responsible for passing the glyph ids. But now we need showText to know about
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We current

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:47 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(and so have no entry in the camp table). This should be easy to add to
TTFSubsetter as it already tracks glyph ids internally, we just need the
ability to pass them in too. Then PDPageContentStream#showText will be
responsible for passing the glyph ids. But now we need showText to know about
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of glyph
_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not
possible to handle positionings in PDFont#encode(), so that helps explain why
showText() is the right place for OpenType, as showText performs both
positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> --

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of 
glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's 
not possible to handle positionings in PDFont#encode(), so that helps explain 
why showText() is the right place for OpenType, as showText performs both 
positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and 
_positionings_ (via GPOS). Obviously it's not possible to handle positionings 
in PDFont#encode(), so that helps explain why showText() is the right place for 
OpenType, as showText performs both positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of glyph 
_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not 
possible to handle positionings in PDFont#encode(), so that helps explain why 
showText() is the right place for OpenType, as showText performs both 
positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of 
glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's 
not possible to handle positionings in PDFont#encode(), so that helps explain 
why showText() is the right place for OpenType, as showText performs both 
positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:16 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and 
_positionings_ (via GPOS). Obviously it's not possible to handle positionings 
in PDFont#encode(), so that helps explain why showText() is the right place for 
OpenType, as showText performs both positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:11 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (by design).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

18 matches

Site Navigation

Mail list logo

Footer information