date:20180414

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 1:04 AM:
--

Hi guys, this is a really welcome contribution, thank you. With regards to
PDFont#encode(String text) being non-final I can add some insight as I was the
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to
how they are represented when embedded in PDF files. So there's no support for
OpenType, by design. A Type0 font knows nothing about OpenType (but we can
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of
abstraction up, during text _layout_ instead of text _encoding_*_._* So you
want to put your glyph substitution code inside PDPageContentStream#showText,
actually you want
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle
positionings in PDFont#encode(), so that helps explain why showText() is the
right place for OpenType, as showText performs both positioning and encoding.
We also need to keep track of glyphs for subsetting, which is not possible in
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(no entry in the camp table). This should be easy to add to TTFSubsetter as it
already tracks glyph ids internally, we just need the ability to pass them in
too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be
responsible for passing the glyph ids. But now we need showText to know about
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{final
PDFGlyphVector layout(String text)}} which is called from
PDPageContentStream#showText instead of encode(text). I also think it would be
fine to use instanceof to detect this case, because only PDType0Font need have
this capability. I'm assuming PDFGlyphVector is our own very simple version of
the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy)
tuples. Then all that PDPageContentStream#showText needs to know how to do is
to draw a PDFGlyphVector on the page, by converting it into the equivalent text
drawing operations (Tj and the like). Because this patch is just for GSUB, all
of those positioning values can just be zero, and we need not implemented any
actual glyph positioning in showText() yet :). Thus GlyphVector will serve
simply as an array of GIDs.

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes. P.S. Make sure any new
APIs are {{final}}. All of the suggestions above consist of adding only
non-breaking APIs, which is nice.

Thanks!

was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to
PDFont#encode(String text) being non-final I can add some insight as I was the
original designer of our current PDFont#encode mechanism.

That way PDFont#encode(String text) can

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:57 AM:
---

That way PDFont#encode(String text) can stay non-final :)

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes. P.S. Make sure any new
APIs are {{final}}.

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:52 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText instead
of encode(text). I also think it would be fine to use instanceof to detect this
case, because only PDType0Font need have this capability. I'm assuming
PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which
is effectively just a vector of (gid, dx, dy) tuples. Then all that
PDPageContentStream#showText needs to know how to do is to draw a
PDFGlyphVector on the page, by converting it into the equivalent text drawing
operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:51 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:49 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(no entry in the camp table). This should be easy to add to TTFSubsetter as it
already tracks glyph ids internally, we just need the ability to pass them in
too. Then PDPageContentStream#showText will be responsible for passing the
glyph ids. But now we need showText to know about those glyph ids, which leads
me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:48 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(and so have no entry in the camp table). This should be easy to add to
TTFSubsetter as it already tracks glyph ids internally, we just need the
ability to pass them in too. Then PDPageContentStream#showText will be
responsible for passing the glyph ids. But now we need showText to know about
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
]

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:47 AM:
---

That way PDFont#encode(String text) can stay non-final :)

*Subsetting*: We currently track which glyphs need to be included in a subset
by using their Unicode code point, but with GSUB enabled we will have to keep
track of some substituted glyphs via their glyph id (GID), because the glyphs
which result from a substitution don't necessarily have their own code points
(and so have no entry in the camp table). This should be easy to add to
TTFSubsetter as it already tracks glyph ids internally, we just need the
ability to pass them in too. Then PDPageContentStream#showText will be
responsible for passing the glyph ids. But now we need showText to know about
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as
a
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
which encapsulates substitutions via GID and positioning via a transform
associated with each glyph. PDFBox might want to do something similar, I think
it would even be ok to add this to PDType0Font (because I'm suggesting a
specific OpenType API so it doesn't interfere with our PDType0Font's
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector
layout(String text)}} which is called from PDPageContentStream#showText. I also
think it would be fine to use instanceof to detect this case, because only
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own
very simple version of the JDK's GlyphVector, which is effectively just a
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText
needs to know how to do is to draw a PDFGlyphVector on the page, by converting
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not
compatible with subsetting without making some changes.

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of glyph
_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not
possible to handle positionings in PDFont#encode(), so that helps explain why
showText() is the right place for OpenType, as showText performs both
positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
>

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of 
glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's 
not possible to handle positionings in PDFont#encode(), so that helps explain 
why showText() is the right place for OpenType, as showText performs both 
positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and 
_positionings_ (via GPOS). Obviously it's not possible to handle positionings 
in PDFont#encode(), so that helps explain why showText() is the right place for 
OpenType, as showText performs both positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of glyph 
_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not 
possible to handle positionings in PDFont#encode(), so that helps explain why 
showText() is the right place for OpenType, as showText performs both 
positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of 
glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's 
not possible to handle positionings in PDFont#encode(), so that helps explain 
why showText() is the right place for OpenType, as showText performs both 
positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:16 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and 
_positionings_ (via GPOS). Obviously it's not possible to handle positionings 
in PDFont#encode(), so that helps explain why showText() is the right place for 
OpenType, as showText performs both positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:11 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (by design).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson commented on PDFBOX-4189:
-

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (by design).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread Palash Ray (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438487#comment-16438487
 ] 

Palash Ray commented on PDFBOX-4189:


I have pushed some changes which takes care of most of the issues that you have 
pointed out except:
 # subsetting
 # BengaliPdfGenerationHelloWorld should be integrated into the 
EmbeddedFonts.java example

I will take care of these as well. Meanwhile, please let me know if any other 
changes are needed.

 

Thanks,

Palash.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

47 matches

Mail list logo