[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 1:04 AM:
--

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(no entry in the camp table). This should be easy to add to TTFSubsetter as it 
already tracks glyph ids internally, we just need the ability to pass them in 
too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be 
responsible for passing the glyph ids. But now we need showText to know about 
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{final 
PDFGlyphVector layout(String text)}} which is called from 
PDPageContentStream#showText instead of encode(text). I also think it would be 
fine to use instanceof to detect this case, because only PDType0Font need have 
this capability. I'm assuming PDFGlyphVector is our own very simple version of 
the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) 
tuples. Then all that PDPageContentStream#showText needs to know how to do is 
to draw a PDFGlyphVector on the page, by converting it into the equivalent text 
drawing operations (Tj and the like). Because this patch is just for GSUB, all 
of those positioning values can just be zero, and we need not implemented any 
actual glyph positioning in showText() yet :). Thus GlyphVector will serve 
simply as an array of GIDs.

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes. P.S. Make sure any new 
APIs are {{final}}. All of the suggestions above consist of adding only 
non-breaking APIs, which is nice.

Thanks!


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can 

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:57 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(no entry in the camp table). This should be easy to add to TTFSubsetter as it 
already tracks glyph ids internally, we just need the ability to pass them in 
too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be 
responsible for passing the glyph ids. But now we need showText to know about 
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{final 
PDFGlyphVector layout(String text)}} which is called from 
PDPageContentStream#showText instead of encode(text). I also think it would be 
fine to use instanceof to detect this case, because only PDType0Font need have 
this capability. I'm assuming PDFGlyphVector is our own very simple version of 
the JDK's GlyphVector, which is effectively just a vector of (gid, dx, dy) 
tuples. Then all that PDPageContentStream#showText needs to know how to do is 
to draw a PDFGlyphVector on the page, by converting it into the equivalent text 
drawing operations (Tj and the like). Because this patch is just for GSUB, all 
of those positioning values can just be zero, and we need not implemented any 
actual glyph positioning in showText() yet :). Thus GlyphVector will serve 
simply as an array of GIDs.

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes. P.S. Make sure any new 
APIs are {{final}}.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:52 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(no entry in the camp table). This should be easy to add to TTFSubsetter as it 
already tracks glyph ids internally, we just need the ability to pass them in 
too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be 
responsible for passing the glyph ids. But now we need showText to know about 
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector 
layout(String text)}} which is called from PDPageContentStream#showText instead 
of encode(text). I also think it would be fine to use instanceof to detect this 
case, because only PDType0Font need have this capability. I'm assuming 
PDFGlyphVector is our own very simple version of the JDK's GlyphVector, which 
is effectively just a vector of (gid, dx, dy) tuples. Then all that 
PDPageContentStream#showText needs to know how to do is to draw a 
PDFGlyphVector on the page, by converting it into the equivalent text drawing 
operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for 

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:51 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(no entry in the camp table). This should be easy to add to TTFSubsetter as it 
already tracks glyph ids internally, we just need the ability to pass them in 
too, e.g. addGlyphId(integer). Then PDPageContentStream#showText will be 
responsible for passing the glyph ids. But now we need showText to know about 
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector 
layout(String text)}} which is called from PDPageContentStream#showText. I also 
think it would be fine to use instanceof to detect this case, because only 
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own 
very simple version of the JDK's GlyphVector, which is effectively just a 
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText 
needs to know how to do is to draw a PDFGlyphVector on the page, by converting 
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not 

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:49 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(no entry in the camp table). This should be easy to add to TTFSubsetter as it 
already tracks glyph ids internally, we just need the ability to pass them in 
too. Then PDPageContentStream#showText will be responsible for passing the 
glyph ids. But now we need showText to know about those glyph ids, which leads 
me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector 
layout(String text)}} which is called from PDPageContentStream#showText. I also 
think it would be fine to use instanceof to detect this case, because only 
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own 
very simple version of the JDK's GlyphVector, which is effectively just a 
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText 
needs to know how to do is to draw a PDFGlyphVector on the page, by converting 
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().


[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:48 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (but we can 
relax this a bit, as I explain below).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(and so have no entry in the camp table). This should be easy to add to 
TTFSubsetter as it already tracks glyph ids internally, we just need the 
ability to pass them in too. Then PDPageContentStream#showText will be 
responsible for passing the glyph ids. But now we need showText to know about 
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector 
layout(String text)}} which is called from PDPageContentStream#showText. I also 
think it would be fine to use instanceof to detect this case, because only 
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own 
very simple version of the JDK's GlyphVector, which is effectively just a 
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText 
needs to know how to do is to draw a PDFGlyphVector on the page, by converting 
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which 

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:47 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*OpenType*: In general, OpenType layouts consist of glyph _substitutions_ (via 
GSUB) and _positionings_ (via GPOS). Obviously it's not possible to handle 
positionings in PDFont#encode(), so that helps explain why showText() is the 
right place for OpenType, as showText performs both positioning and encoding. 
We also need to keep track of glyphs for subsetting, which is not possible in 
encode().

*Subsetting*: We currently track which glyphs need to be included in a subset 
by using their Unicode code point, but with GSUB enabled we will have to keep 
track of some substituted glyphs via their glyph id (GID), because the glyphs 
which result from a substitution don't necessarily have their own code points 
(and so have no entry in the camp table). This should be easy to add to 
TTFSubsetter as it already tracks glyph ids internally, we just need the 
ability to pass them in too. Then PDPageContentStream#showText will be 
responsible for passing the glyph ids. But now we need showText to know about 
those glyph ids, which leads me to

*Glyph IDs:* The JDK represents text which has been through OpenType layout as 
a 
[GlyphVector|https://docs.oracle.com/javase/7/docs/api/java/awt/font/GlyphVector.html]
 which encapsulates substitutions via GID and positioning via a transform 
associated with each glyph. PDFBox might want to do something similar, I think 
it would even be ok to add this to PDType0Font (because I'm suggesting a 
specific OpenType API so it doesn't interfere with our PDType0Font's 
non-OpenType assumption) in the form of a method such as: {{PDFGlyphVector 
layout(String text)}} which is called from PDPageContentStream#showText. I also 
think it would be fine to use instanceof to detect this case, because only 
PDType0Font need have this capability. I'm assuming PDFGlyphVector is our own 
very simple version of the JDK's GlyphVector, which is effectively just a 
vector of (gid, dx, dy) tuples. Then all that PDPageContentStream#showText 
needs to know how to do is to draw a PDFGlyphVector on the page, by converting 
it into the equivalent text drawing operations (Tj and the like).

Phew! That was a lot of information. Just to be clear, the current patch is not 
compatible with subsetting without making some changes.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of glyph 
_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not 
possible to handle positionings in PDFont#encode(), so that helps explain why 
showText() is the right place for OpenType, as showText performs both 
positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
>  

[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of 
glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's 
not possible to handle positionings in PDFont#encode(), so that helps explain 
why showText() is the right place for OpenType, as showText performs both 
positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and 
_positionings_ (via GPOS). Obviously it's not possible to handle positionings 
in PDFont#encode(), so that helps explain why showText() is the right place for 
OpenType, as showText performs both positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:17 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of glyph 
_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's not 
possible to handle positionings in PDFont#encode(), so that helps explain why 
showText() is the right place for OpenType, as showText performs both 
positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

*Technical Background*: In general, OpenType layouts consist of 
glyph_substitutions_ (via GSUB) and _positionings_ (via GPOS). Obviously it's 
not possible to handle positionings in PDFont#encode(), so that helps explain 
why showText() is the right place for OpenType, as showText performs both 
positioning and encoding.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:16 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

In general, OpenType layouts consist of glyph_substitutions_ (via GSUB) and 
_positionings_ (via GPOS). Obviously it's not possible to handle positionings 
in PDFont#encode(), so that helps explain why showText() is the right place for 
OpenType, as showText performs both positioning and encoding.


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson edited comment on PDFBOX-4189 at 4/15/18 12:11 AM:
---

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType.

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)


was (Author: jahewson):
Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (by design).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540
 ] 

John Hewson commented on PDFBOX-4189:
-

Hi guys, this is a really welcome contribution, thank you. With regards to 
PDFont#encode(String text) being non-final I can add some insight as I was the 
original designer of our current PDFont#encode mechanism.

Basically, the PDFont classes are designed to represent fonts identically to 
how they are represented when embedded in PDF files. So there's no support for 
OpenType, by design. A Type0 font knows nothing about OpenType (by design).

So how can we use OpenType in PDFBox? The answer is that we do it one layer of 
abstraction up, during text _layout_ instead of text _encoding_*_._* So you 
want to put your glyph substitution code inside PDPageContentStream#showText, 
actually you want 
[PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256].

That way PDFont#encode(String text) can stay non-final :)

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread Palash Ray (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438487#comment-16438487
 ] 

Palash Ray commented on PDFBOX-4189:


I have pushed some changes which takes care of most of the issues that you have 
pointed out except:
 # subsetting
 # BengaliPdfGenerationHelloWorld should be integrated into the 
EmbeddedFonts.java example

I will take care of these as well. Meanwhile, please let me know if any other 
changes are needed.

 

Thanks,

Palash.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Build failed in Jenkins: PDFBox-2.0.x #985

2018-04-14 Thread Apache Jenkins Server
See 

--
Started by user tilman
[EnvInject] - Loading node environment variables.
Building remotely on H28 (ubuntu xenial) in workspace 

Cleaning up 
Deleting 
Updating http://svn.apache.org/repos/asf/pdfbox/branches/2.0 at revision 
'2018-04-14T18:59:25.140 +'
At revision 1829162

No changes for http://svn.apache.org/repos/asf/pdfbox/branches/2.0 since the 
previous build
Parsing POMs
Established TCP socket on 38009
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[2.0] $ /home/jenkins/tools/java/jdk1.8.0_66-unlimited-security/bin/java -Xmx1g 
-XX:MaxPermSize=300m -cp 
/home/jenkins/jenkins-slave/maven35-agent.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/apache-maven-3.5.0 
/home/jenkins/jenkins-slave/slave.jar 
/home/jenkins/jenkins-slave/maven35-interceptor.jar 
/home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 38009
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/1 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:2.0.10-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] PDFBox reactor
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 2.0.10-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
' for files matching 
the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #982
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.14:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: 

Build failed in Jenkins: PDFBox-2.0.x » PDFBox parent #985

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 38009
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-2.0.x/2.0/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/1 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:2.0.10-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] PDFBox reactor
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 2.0.10-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #982
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.14:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[WARNING] Unable to download the NVD CVE data; the results may not include the 
most recent CPE/CVEs from the NVD.
[INFO] If you are behind a proxy you may need to configure dependency-check to 
use the proxy.
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[WARNING] Unable to update Cached Web DataSource, using local data instead. 
Results may not include recent vulnerabilities.
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] Finished Dependency Merging Analyzer (0 seconds)
[INFO] Finished Version Filter Analyzer (0 seconds)
[INFO] Finished Hint Analyzer (0 seconds)

Build failed in Jenkins: PDFBox-2.0.x #984

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-4182, PDFBOX-4188: correct javadoc

[msahyoun] PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source 
PDDocument after the individual merge; early implementation

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H28 (ubuntu xenial) in workspace 

Cleaning up 
Deleting 
Updating http://svn.apache.org/repos/asf/pdfbox/branches/2.0 at revision 
'2018-04-14T18:29:12.065 +'
U pdfbox/src/main/java/org/apache/pdfbox/multipdf/PDFMergerUtility.java
At revision 1829162

Parsing POMs
Established TCP socket on 34145
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[2.0] $ /home/jenkins/tools/java/jdk1.8.0_66-unlimited-security/bin/java -Xmx1g 
-XX:MaxPermSize=300m -cp 
/home/jenkins/jenkins-slave/maven35-agent.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/apache-maven-3.5.0 
/home/jenkins/jenkins-slave/slave.jar 
/home/jenkins/jenkins-slave/maven35-interceptor.jar 
/home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 34145
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:2.0.10-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] PDFBox reactor
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 2.0.10-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
' for files matching 
the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #982
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.14:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: Network is unreachable
[ERROR] 

Build failed in Jenkins: PDFBox-2.0.x » PDFBox parent #984

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 34145
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-2.0.x/2.0/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:2.0.10-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] PDFBox reactor
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 2.0.10-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #982
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.14:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[ERROR] IO Exception: Network is unreachable
[WARNING] Unable to download the NVD CVE data; the results may not include the 
most recent CPE/CVEs from the NVD.
[INFO] If you are behind a proxy you may need to configure dependency-check to 
use the proxy.
[WARNING] Unable to update Cached Web DataSource, using local data instead. 
Results may not include recent vulnerabilities.
[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] Finished Dependency Merging Analyzer (0 seconds)
[INFO] 

Build failed in Jenkins: PDFBox-Trunk-jdk9 #430

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-4182, PDFBOX-4188: correct javadoc

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H21 (ubuntu xenial) in workspace 

Cleaning up 
Deleting 

Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision 
'2018-04-14T18:27:15.615 +'
U pdfbox/src/main/java/org/apache/pdfbox/multipdf/PDFMergerUtility.java
At revision 1829161

Parsing POMs
Established TCP socket on 45817
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[trunk] $ /home/jenkins/tools/java/jdk-9-b181-unlimited-security/bin/java 
-Xmx1g -XX:MaxPermSize=300m -cp 
/home/jenkins/jenkins-slave/maven35-agent.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/apache-maven-3.5.0 
/home/jenkins/jenkins-slave/slave.jar 
/home/jenkins/jenkins-slave/maven35-interceptor.jar 
/home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 45817
Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option MaxPermSize; support 
was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by hudson.remoting.RemoteClassLoader 
(file:/home/jenkins/jenkins-slave/slave.jar) to method 
java.lang.ClassLoader.getClassLoadingLock(java.lang.String)
WARNING: Please consider reporting this to the maintainers of 
hudson.remoting.RemoteClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Executing Maven:  -B -f 
 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
install -Ppedantic,jdk9 -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
' for files 
matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #428
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more 

Build failed in Jenkins: PDFBox-Trunk-jdk9 » PDFBox parent #430

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 45817
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option MaxPermSize; support 
was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by hudson.remoting.RemoteClassLoader 
(file:/home/jenkins/jenkins-slave/slave.jar) to method 
java.lang.ClassLoader.getClassLoadingLock(java.lang.String)
WARNING: Please consider reporting this to the maintainers of 
hudson.remoting.RemoteClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-Trunk-jdk9/trunk/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
install -Ppedantic,jdk9 -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #428
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect failed)
[ERROR] IO Exception: Network is unreachable (connect 

Build failed in Jenkins: PDFBox-trunk #3970

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-4182, PDFBOX-4188: correct javadoc

[msahyoun] PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source 
PDDocument after the individual merge; early implementation

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H25 (ubuntu xenial) in workspace 

Cleaning up 
Deleting 
Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision 
'2018-04-14T17:58:08.916 +'
U pdfbox/src/main/java/org/apache/pdfbox/multipdf/PDFMergerUtility.java
At revision 1829160

Parsing POMs
Established TCP socket on 38464
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[trunk] $ /home/jenkins/tools/java/jdk1.8.0_66-unlimited-security/bin/java 
-Xmx1g -XX:MaxPermSize=300m -cp 
/home/jenkins/jenkins-slave/maven35-agent.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/apache-maven-3.5.0 
/home/jenkins/jenkins-slave/slave.jar 
/home/jenkins/jenkins-slave/maven35-interceptor.jar 
/home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 38464
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
' for files 
matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #3966
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: Connection reset
[ERROR] IO 

Build failed in Jenkins: PDFBox-trunk » PDFBox parent #3970

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 38464
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-trunk/trunk/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #3966
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: Connection reset
[ERROR] IO Exception: Connection reset
[ERROR] IO Exception: Connection reset
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: Connection reset
[ERROR] IO Exception: connect timed out
[WARNING] Unable to download the NVD CVE data; the results may not include the 
most recent CPE/CVEs from the NVD.
[INFO] If you are behind a proxy you may need to configure dependency-check to 
use the proxy.
[WARNING] Unable to update Cached Web DataSource, using local data instead. 
Results may not include recent vulnerabilities.
[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] Finished Dependency Merging Analyzer (0 seconds)
[INFO] Finished Version Filter Analyzer (0 seconds)
[INFO] Finished Hint Analyzer (0 seconds)
[INFO] Created CPE Index (1 seconds)
[INFO] Skipping CPE Analysis for npm
[INFO] Finished CPE Analyzer (1 seconds)
[INFO] Finished False Positive Analyzer (0 seconds)
[INFO] Finished NVD CVE Analyzer (0 seconds)
[INFO] Finished Vulnerability Suppression Analyzer (0 seconds)
[INFO] Finished Dependency Bundling Analyzer (0 seconds)
[INFO] Analysis Complete (1 seconds)
[ERROR] IO Exception: Connection reset

[jira] [Commented] (PDFBOX-4182) Improve memory usage of PDFMergerUtility

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438441#comment-16438441
 ] 

ASF subversion and git services commented on PDFBOX-4182:
-

Commit 1829159 from [~msahyoun] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1829159 ]

PDFBOX-4182, PDFBOX-4188: correct javadoc

> Improve memory usage of PDFMergerUtility
> 
>
> Key: PDFBOX-4182
> URL: https://issues.apache.org/jira/browse/PDFBOX-4182
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9
>Reporter: Pas Filip
>Priority: Major
> Attachments: PDFMergerUtilityUsingSupplier.java, Supplier.java, 
> Suppliers.java, 
> failed-merge-utility-4gb-heap-out-of-memory-after-1800-pdfs.png, 
> merge-pdf-stats.xlsx, merge-utility.patch, 
> oom-2gb-heap-after-refactoring-leak-suspect-1.png, 
> oom-2gb-heap-after-refactoring-leak-suspect-2.png, successful - 
> refactored-merge-utility-4gb-heap-2618-files-merged.png, successful 
> -merge-utility-6gb-heap-2618-files-merged.png, 
> successful-merge-utility-6gb-heap-2618-files-merged-setupTempFileOnly.png, 
> successful-merge-utility-8gb-heap-2618-files-merged.png, 
> successful-refactored-merge-utility-4gb-heap-2618-files-merged-setupTempFileOnly.png
>
>
> I have been running some tests trying to merge large amounts (2618) of small 
> pdf documents, between 100kb and 130kb, into a single large pdf (288.433kb)
> Memory consumption seems to be the main limitation.
> ScratchFileBuffer seems to consume the majority of the memory usage.
> (see screenshot from mat in attachment)
> (I would include the hprof in attachment so you can analyze yourselves but 
> it's rather large)
> Note that it seems impossible to generate a large pdf using a small memory 
> footprint.
> I personally thought that using MemorySettings with temporary file only would 
> allow me to generate arbitrarily large pdf files but it doesn't seem to help.
> I've run the mergeDocuments with  memory settings:
>  * MemoryUsageSetting.setupMixed(1024L * 1024L, 1024L * 1024L * 1024L * 1024L 
> * 1024L)
>  * MemoryUsageSetting.setupTempFileOnly()
> Refactored version completes with *4GB* heap:
> with temp file only completes 2618 documents in 1.760 min
> *VS*
> *8GB* heap:
> with temp file only completes 2618 documents in 2.0 min
> Heaps of 6gb or less result in OOM. (Didn't try different sizes between 6GB 
> and 8GB)
>  It looks like the loop in the mergeDocuments accumulates PDDocument objects 
> in a list which are closed after the merge is completed.
> Refactoring the code to close these as they are used, instead of accumulating 
> them and closing all at the end, improves memory usage considerably.(although 
> doesn't seem to be eliminated completed based on mat analysis.)
> Another change I've implemented is to only create the inputstream when the 
> file needs to be read and to close it alongside the PDDocument.
> (Some inputstreams contain buffers and depending on the size of the buffers 
> and or the stream type accumulating all the streams is a potential 
> memory-hog.)
> These changes seems to have a beneficial improvement in the sense that I can 
> process the same amount of pdfs with about half the memory.
>  I'd appreciate it if you could roll these changes into the main codebase.
> (I've respected java 6 compatibility.)
> I've included in attachment the java files of the new implementation:
>  * Suppliers
>  * Supplier
>  * PDFMergerUtilityUsingSupplier
> PDFMergerUtilityUsingSupplier can replace the previous version. No signature 
> changes only internal code changes. (just rename the class to 
> PDFMergerUtility if you decide to implemented the changes.)
>  In attachment you can also find some screenshots from visualvm showing the 
> memory usage of the original version and the refactored version as well as 
> some info produced by mat after analysing the heap.
> If you know of any other means, without running into memory issues, to merge 
> large sets of pdf files into a large single pdf I'd love to hear about it!
> I'd also suggest that there should be further improvements made in memory 
> usage in general as pdfbox seems to consumer a lot of memory in general.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4188) "Maximum allowed scratch file memory exceeded." Exception when merging large number of small PDFs

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438440#comment-16438440
 ] 

ASF subversion and git services commented on PDFBOX-4188:
-

Commit 1829158 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1829158 ]

PDFBOX-4182, PDFBOX-4188: correct javadoc

>  "Maximum allowed scratch file memory exceeded." Exception when merging large 
> number of small PDFs
> --
>
> Key: PDFBOX-4188
> URL: https://issues.apache.org/jira/browse/PDFBOX-4188
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9, 3.0.0 PDFBox
>Reporter: Gary Potagal
>Priority: Major
> Attachments: PDFBOX-4188-MemoryManagerPatch.zip, 
> PDFBOX-4188-breakingTest.zip, PDFMergerUtility.java-20180412.patch
>
>
>  
> Am 06.04.2018 um 23:10 schrieb Gary Potagal:
>  
> We wanted to address one more merge issue in 
> org.apache.pdfbox.multipdf.PDFMergerUtility#mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting).
> We need to merge a large number of small files.  We use mixed mode, memory 
> and disk for cache.  Initially, we would often get "Maximum allowed scratch 
> file memory exceeded.", unless we turned off the check by passing "-1" to 
> org.apache.pdfbox.io.MemoryUsageSetting#MemoryUsageSetting.  I believe, this 
> is what the users that opened PDFBOX-3721 where running into.
> Our research indicates that the core issue with the memory model is that 
> instead of sharing a single cache, it breaks it up into equal sized fixed 
> partitions based on the number of input + output files being merged.  This 
> means that each partition must be big enough to hold the final output file.  
> When 400 1-page files are merged, this creates 401 partitions, but each of 
> which needs to be big enough to hold the final 400 pages.  Even worse, the 
> merge algorithm needs to keep all files open until the end.
> Given this, near the end of the merge, we're actually caching 400 x 1-page 
> input files, and 1 x 400-page output file, or 801 pages.
> However, with the partitioned cache, we need to declare room for 401  x 
> 400-pages, or 160,400 pages in total when specifying "maxStorageBytes".  This 
> would be a very high number, usually in GIGs.
>  
> Given the current limitation that we need to keep all the input files open 
> until the output file is written (HUGE), we came up with 2 options.  (See 
> PDFBOX-4182)  
>  
> 1.  Good: Split the cache in ½, give ½ to the output file, and segment the 
> other ½ across the input files. (Still keeping them open until then end).
> 2.  Better: Dynamically allocate in 16 page (64K) chunks from memory or disk 
> on demand, release cache as documents are closed after merge.  This is our 
> current implementation till PDFBOX-3999, PDFBOX-4003 and PDFBOX-4004 are 
> addressed.
>  
> We would like to submit our current implementation as a Patch to 2.0.10 and 
> 3.0.0, unless this is already addressed.
>  
>  Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4182) Improve memory usage of PDFMergerUtility

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438439#comment-16438439
 ] 

ASF subversion and git services commented on PDFBOX-4182:
-

Commit 1829158 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1829158 ]

PDFBOX-4182, PDFBOX-4188: correct javadoc

> Improve memory usage of PDFMergerUtility
> 
>
> Key: PDFBOX-4182
> URL: https://issues.apache.org/jira/browse/PDFBOX-4182
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9
>Reporter: Pas Filip
>Priority: Major
> Attachments: PDFMergerUtilityUsingSupplier.java, Supplier.java, 
> Suppliers.java, 
> failed-merge-utility-4gb-heap-out-of-memory-after-1800-pdfs.png, 
> merge-pdf-stats.xlsx, merge-utility.patch, 
> oom-2gb-heap-after-refactoring-leak-suspect-1.png, 
> oom-2gb-heap-after-refactoring-leak-suspect-2.png, successful - 
> refactored-merge-utility-4gb-heap-2618-files-merged.png, successful 
> -merge-utility-6gb-heap-2618-files-merged.png, 
> successful-merge-utility-6gb-heap-2618-files-merged-setupTempFileOnly.png, 
> successful-merge-utility-8gb-heap-2618-files-merged.png, 
> successful-refactored-merge-utility-4gb-heap-2618-files-merged-setupTempFileOnly.png
>
>
> I have been running some tests trying to merge large amounts (2618) of small 
> pdf documents, between 100kb and 130kb, into a single large pdf (288.433kb)
> Memory consumption seems to be the main limitation.
> ScratchFileBuffer seems to consume the majority of the memory usage.
> (see screenshot from mat in attachment)
> (I would include the hprof in attachment so you can analyze yourselves but 
> it's rather large)
> Note that it seems impossible to generate a large pdf using a small memory 
> footprint.
> I personally thought that using MemorySettings with temporary file only would 
> allow me to generate arbitrarily large pdf files but it doesn't seem to help.
> I've run the mergeDocuments with  memory settings:
>  * MemoryUsageSetting.setupMixed(1024L * 1024L, 1024L * 1024L * 1024L * 1024L 
> * 1024L)
>  * MemoryUsageSetting.setupTempFileOnly()
> Refactored version completes with *4GB* heap:
> with temp file only completes 2618 documents in 1.760 min
> *VS*
> *8GB* heap:
> with temp file only completes 2618 documents in 2.0 min
> Heaps of 6gb or less result in OOM. (Didn't try different sizes between 6GB 
> and 8GB)
>  It looks like the loop in the mergeDocuments accumulates PDDocument objects 
> in a list which are closed after the merge is completed.
> Refactoring the code to close these as they are used, instead of accumulating 
> them and closing all at the end, improves memory usage considerably.(although 
> doesn't seem to be eliminated completed based on mat analysis.)
> Another change I've implemented is to only create the inputstream when the 
> file needs to be read and to close it alongside the PDDocument.
> (Some inputstreams contain buffers and depending on the size of the buffers 
> and or the stream type accumulating all the streams is a potential 
> memory-hog.)
> These changes seems to have a beneficial improvement in the sense that I can 
> process the same amount of pdfs with about half the memory.
>  I'd appreciate it if you could roll these changes into the main codebase.
> (I've respected java 6 compatibility.)
> I've included in attachment the java files of the new implementation:
>  * Suppliers
>  * Supplier
>  * PDFMergerUtilityUsingSupplier
> PDFMergerUtilityUsingSupplier can replace the previous version. No signature 
> changes only internal code changes. (just rename the class to 
> PDFMergerUtility if you decide to implemented the changes.)
>  In attachment you can also find some screenshots from visualvm showing the 
> memory usage of the original version and the refactored version as well as 
> some info produced by mat after analysing the heap.
> If you know of any other means, without running into memory issues, to merge 
> large sets of pdf files into a large single pdf I'd love to hear about it!
> I'd also suggest that there should be further improvements made in memory 
> usage in general as pdfbox seems to consumer a lot of memory in general.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4188) "Maximum allowed scratch file memory exceeded." Exception when merging large number of small PDFs

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438442#comment-16438442
 ] 

ASF subversion and git services commented on PDFBOX-4188:
-

Commit 1829159 from [~msahyoun] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1829159 ]

PDFBOX-4182, PDFBOX-4188: correct javadoc

>  "Maximum allowed scratch file memory exceeded." Exception when merging large 
> number of small PDFs
> --
>
> Key: PDFBOX-4188
> URL: https://issues.apache.org/jira/browse/PDFBOX-4188
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9, 3.0.0 PDFBox
>Reporter: Gary Potagal
>Priority: Major
> Attachments: PDFBOX-4188-MemoryManagerPatch.zip, 
> PDFBOX-4188-breakingTest.zip, PDFMergerUtility.java-20180412.patch
>
>
>  
> Am 06.04.2018 um 23:10 schrieb Gary Potagal:
>  
> We wanted to address one more merge issue in 
> org.apache.pdfbox.multipdf.PDFMergerUtility#mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting).
> We need to merge a large number of small files.  We use mixed mode, memory 
> and disk for cache.  Initially, we would often get "Maximum allowed scratch 
> file memory exceeded.", unless we turned off the check by passing "-1" to 
> org.apache.pdfbox.io.MemoryUsageSetting#MemoryUsageSetting.  I believe, this 
> is what the users that opened PDFBOX-3721 where running into.
> Our research indicates that the core issue with the memory model is that 
> instead of sharing a single cache, it breaks it up into equal sized fixed 
> partitions based on the number of input + output files being merged.  This 
> means that each partition must be big enough to hold the final output file.  
> When 400 1-page files are merged, this creates 401 partitions, but each of 
> which needs to be big enough to hold the final 400 pages.  Even worse, the 
> merge algorithm needs to keep all files open until the end.
> Given this, near the end of the merge, we're actually caching 400 x 1-page 
> input files, and 1 x 400-page output file, or 801 pages.
> However, with the partitioned cache, we need to declare room for 401  x 
> 400-pages, or 160,400 pages in total when specifying "maxStorageBytes".  This 
> would be a very high number, usually in GIGs.
>  
> Given the current limitation that we need to keep all the input files open 
> until the output file is written (HUGE), we came up with 2 options.  (See 
> PDFBOX-4182)  
>  
> 1.  Good: Split the cache in ½, give ½ to the output file, and segment the 
> other ½ across the input files. (Still keeping them open until then end).
> 2.  Better: Dynamically allocate in 16 page (64K) chunks from memory or disk 
> on demand, release cache as documents are closed after merge.  This is our 
> current implementation till PDFBOX-3999, PDFBOX-4003 and PDFBOX-4004 are 
> addressed.
>  
> We would like to submit our current implementation as a Patch to 2.0.10 and 
> 3.0.0, unless this is already addressed.
>  
>  Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4188) "Maximum allowed scratch file memory exceeded." Exception when merging large number of small PDFs

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438437#comment-16438437
 ] 

ASF subversion and git services commented on PDFBOX-4188:
-

Commit 1829156 from [~msahyoun] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1829156 ]

PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source PDDocument 
after the individual merge; early implementation

>  "Maximum allowed scratch file memory exceeded." Exception when merging large 
> number of small PDFs
> --
>
> Key: PDFBOX-4188
> URL: https://issues.apache.org/jira/browse/PDFBOX-4188
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9, 3.0.0 PDFBox
>Reporter: Gary Potagal
>Priority: Major
> Attachments: PDFBOX-4188-MemoryManagerPatch.zip, 
> PDFBOX-4188-breakingTest.zip, PDFMergerUtility.java-20180412.patch
>
>
>  
> Am 06.04.2018 um 23:10 schrieb Gary Potagal:
>  
> We wanted to address one more merge issue in 
> org.apache.pdfbox.multipdf.PDFMergerUtility#mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting).
> We need to merge a large number of small files.  We use mixed mode, memory 
> and disk for cache.  Initially, we would often get "Maximum allowed scratch 
> file memory exceeded.", unless we turned off the check by passing "-1" to 
> org.apache.pdfbox.io.MemoryUsageSetting#MemoryUsageSetting.  I believe, this 
> is what the users that opened PDFBOX-3721 where running into.
> Our research indicates that the core issue with the memory model is that 
> instead of sharing a single cache, it breaks it up into equal sized fixed 
> partitions based on the number of input + output files being merged.  This 
> means that each partition must be big enough to hold the final output file.  
> When 400 1-page files are merged, this creates 401 partitions, but each of 
> which needs to be big enough to hold the final 400 pages.  Even worse, the 
> merge algorithm needs to keep all files open until the end.
> Given this, near the end of the merge, we're actually caching 400 x 1-page 
> input files, and 1 x 400-page output file, or 801 pages.
> However, with the partitioned cache, we need to declare room for 401  x 
> 400-pages, or 160,400 pages in total when specifying "maxStorageBytes".  This 
> would be a very high number, usually in GIGs.
>  
> Given the current limitation that we need to keep all the input files open 
> until the output file is written (HUGE), we came up with 2 options.  (See 
> PDFBOX-4182)  
>  
> 1.  Good: Split the cache in ½, give ½ to the output file, and segment the 
> other ½ across the input files. (Still keeping them open until then end).
> 2.  Better: Dynamically allocate in 16 page (64K) chunks from memory or disk 
> on demand, release cache as documents are closed after merge.  This is our 
> current implementation till PDFBOX-3999, PDFBOX-4003 and PDFBOX-4004 are 
> addressed.
>  
> We would like to submit our current implementation as a Patch to 2.0.10 and 
> 3.0.0, unless this is already addressed.
>  
>  Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4182) Improve memory usage of PDFMergerUtility

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438436#comment-16438436
 ] 

ASF subversion and git services commented on PDFBOX-4182:
-

Commit 1829156 from [~msahyoun] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1829156 ]

PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source PDDocument 
after the individual merge; early implementation

> Improve memory usage of PDFMergerUtility
> 
>
> Key: PDFBOX-4182
> URL: https://issues.apache.org/jira/browse/PDFBOX-4182
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9
>Reporter: Pas Filip
>Priority: Major
> Attachments: PDFMergerUtilityUsingSupplier.java, Supplier.java, 
> Suppliers.java, 
> failed-merge-utility-4gb-heap-out-of-memory-after-1800-pdfs.png, 
> merge-pdf-stats.xlsx, merge-utility.patch, 
> oom-2gb-heap-after-refactoring-leak-suspect-1.png, 
> oom-2gb-heap-after-refactoring-leak-suspect-2.png, successful - 
> refactored-merge-utility-4gb-heap-2618-files-merged.png, successful 
> -merge-utility-6gb-heap-2618-files-merged.png, 
> successful-merge-utility-6gb-heap-2618-files-merged-setupTempFileOnly.png, 
> successful-merge-utility-8gb-heap-2618-files-merged.png, 
> successful-refactored-merge-utility-4gb-heap-2618-files-merged-setupTempFileOnly.png
>
>
> I have been running some tests trying to merge large amounts (2618) of small 
> pdf documents, between 100kb and 130kb, into a single large pdf (288.433kb)
> Memory consumption seems to be the main limitation.
> ScratchFileBuffer seems to consume the majority of the memory usage.
> (see screenshot from mat in attachment)
> (I would include the hprof in attachment so you can analyze yourselves but 
> it's rather large)
> Note that it seems impossible to generate a large pdf using a small memory 
> footprint.
> I personally thought that using MemorySettings with temporary file only would 
> allow me to generate arbitrarily large pdf files but it doesn't seem to help.
> I've run the mergeDocuments with  memory settings:
>  * MemoryUsageSetting.setupMixed(1024L * 1024L, 1024L * 1024L * 1024L * 1024L 
> * 1024L)
>  * MemoryUsageSetting.setupTempFileOnly()
> Refactored version completes with *4GB* heap:
> with temp file only completes 2618 documents in 1.760 min
> *VS*
> *8GB* heap:
> with temp file only completes 2618 documents in 2.0 min
> Heaps of 6gb or less result in OOM. (Didn't try different sizes between 6GB 
> and 8GB)
>  It looks like the loop in the mergeDocuments accumulates PDDocument objects 
> in a list which are closed after the merge is completed.
> Refactoring the code to close these as they are used, instead of accumulating 
> them and closing all at the end, improves memory usage considerably.(although 
> doesn't seem to be eliminated completed based on mat analysis.)
> Another change I've implemented is to only create the inputstream when the 
> file needs to be read and to close it alongside the PDDocument.
> (Some inputstreams contain buffers and depending on the size of the buffers 
> and or the stream type accumulating all the streams is a potential 
> memory-hog.)
> These changes seems to have a beneficial improvement in the sense that I can 
> process the same amount of pdfs with about half the memory.
>  I'd appreciate it if you could roll these changes into the main codebase.
> (I've respected java 6 compatibility.)
> I've included in attachment the java files of the new implementation:
>  * Suppliers
>  * Supplier
>  * PDFMergerUtilityUsingSupplier
> PDFMergerUtilityUsingSupplier can replace the previous version. No signature 
> changes only internal code changes. (just rename the class to 
> PDFMergerUtility if you decide to implemented the changes.)
>  In attachment you can also find some screenshots from visualvm showing the 
> memory usage of the original version and the refactored version as well as 
> some info produced by mat after analysing the heap.
> If you know of any other means, without running into memory issues, to merge 
> large sets of pdf files into a large single pdf I'd love to hear about it!
> I'd also suggest that there should be further improvements made in memory 
> usage in general as pdfbox seems to consumer a lot of memory in general.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Build failed in Jenkins: PDFBox-2.0.x #983

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-3809: flatten only specified fields

--
[...truncated 151.04 KB...]
A fontbox/src/main/resources/org/apache/fontbox/cmap/HKscs-B5-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/90ms-RKSJ-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBpc-EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/B5pc-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/Ext-RKSJ-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/EUC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/90ms-RKSJ-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-0
A fontbox/src/main/resources/org/apache/fontbox/cmap/Ext-RKSJ-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-1
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-2
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-3
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-4
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-5
A fontbox/src/main/resources/org/apache/fontbox/cmap/KSC-EUC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-0
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-6
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-1
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-2
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-3
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-4
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-5
A fontbox/src/main/resources/org/apache/fontbox/cmap/ETen-B5-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Japan1-6
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBK2K-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-CNS1-UCS2
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniKS-UCS2-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/90pv-RKSJ-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniGB-UTF16-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/KSC-EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/KSCpc-EUC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBK-EUC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/ETen-B5-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBK2K-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniGB-UCS2-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/90msp-RKSJ-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-Korea1-UCS2
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniKS-UCS2-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/90pv-RKSJ-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniGB-UTF16-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniJIS-UCS2-HW-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/Identity-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/KSCpc-EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBK-EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniGB-UCS2-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniKS-UTF16-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBKp-EUC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/KSCms-UHC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/90msp-RKSJ-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/CNS-EUC-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/H
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniCNS-UTF16-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniJIS-UCS2-HW-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-GB1-UCS2
A fontbox/src/main/resources/org/apache/fontbox/cmap/83pv-RKSJ-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/Identity-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniKS-UTF16-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/KSCms-UHC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniJIS-UCS2-H
A fontbox/src/main/resources/org/apache/fontbox/cmap/CNS-EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/GBKp-EUC-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/V
A fontbox/src/main/resources/org/apache/fontbox/cmap/UniCNS-UTF16-V
A fontbox/src/main/resources/org/apache/fontbox/cmap/Adobe-GB1-0
A 

Build failed in Jenkins: PDFBox-2.0.x » PDFBox parent #983

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 37447
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-2.0.x/2.0/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:2.0.10-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] PDFBox reactor
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 2.0.10-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #982
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.14:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[WARNING] Unable to download the NVD CVE data; the results may not include the 
most recent CPE/CVEs from the NVD.
[INFO] If you are behind a proxy you may need to configure dependency-check to 
use the proxy.
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[WARNING] Unable to update Cached Web DataSource, using local data instead. 
Results may not include recent vulnerabilities.
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] Finished Dependency Merging Analyzer (0 seconds)
[INFO] Finished Version Filter Analyzer (0 seconds)
[INFO] Finished Hint Analyzer (0 seconds)

Build failed in Jenkins: PDFBox-Trunk-jdk9 #429

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source 
PDDocument after the individual merge; early implementation

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H21 (ubuntu xenial) in workspace 

Cleaning up 
Deleting 
Deleting 

Deleting 

Deleting 

Deleting 

Deleting 
Deleting 

Deleting 
Deleting 

Deleting 

Deleting 

Deleting 

Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision 
'2018-04-14T17:27:09.771 +'
U pdfbox/src/main/java/org/apache/pdfbox/multipdf/PDFMergerUtility.java
At revision 1829155

Parsing POMs
Established TCP socket on 34090
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[trunk] $ /home/jenkins/tools/java/jdk-9-b181-unlimited-security/bin/java 
-Xmx1g -XX:MaxPermSize=300m -cp 
/home/jenkins/jenkins-slave/maven35-agent.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/apache-maven-3.5.0 
/home/jenkins/jenkins-slave/slave.jar 
/home/jenkins/jenkins-slave/maven35-interceptor.jar 
/home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 34090
Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option MaxPermSize; support 
was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by hudson.remoting.RemoteClassLoader 
(file:/home/jenkins/jenkins-slave/slave.jar) to method 
java.lang.ClassLoader.getClassLoadingLock(java.lang.String)
WARNING: Please consider reporting this to the maintainers of 
hudson.remoting.RemoteClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Executing Maven:  -B -f 
 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
install -Ppedantic,jdk9 -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
' for files 
matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #428
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 

Build failed in Jenkins: PDFBox-Trunk-jdk9 » PDFBox parent #429

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 34090
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option MaxPermSize; support 
was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by hudson.remoting.RemoteClassLoader 
(file:/home/jenkins/jenkins-slave/slave.jar) to method 
java.lang.ClassLoader.getClassLoadingLock(java.lang.String)
WARNING: Please consider reporting this to the maintainers of 
hudson.remoting.RemoteClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-Trunk-jdk9/trunk/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
install -Ppedantic,jdk9 -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #428
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[WARNING] Unable to 

[jira] [Commented] (PDFBOX-4182) Improve memory usage of PDFMergerUtility

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438428#comment-16438428
 ] 

ASF subversion and git services commented on PDFBOX-4182:
-

Commit 1829154 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1829154 ]

PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source PDDocument 
after the individual merge; early implementation

> Improve memory usage of PDFMergerUtility
> 
>
> Key: PDFBOX-4182
> URL: https://issues.apache.org/jira/browse/PDFBOX-4182
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9
>Reporter: Pas Filip
>Priority: Major
> Attachments: PDFMergerUtilityUsingSupplier.java, Supplier.java, 
> Suppliers.java, 
> failed-merge-utility-4gb-heap-out-of-memory-after-1800-pdfs.png, 
> merge-pdf-stats.xlsx, merge-utility.patch, 
> oom-2gb-heap-after-refactoring-leak-suspect-1.png, 
> oom-2gb-heap-after-refactoring-leak-suspect-2.png, successful - 
> refactored-merge-utility-4gb-heap-2618-files-merged.png, successful 
> -merge-utility-6gb-heap-2618-files-merged.png, 
> successful-merge-utility-6gb-heap-2618-files-merged-setupTempFileOnly.png, 
> successful-merge-utility-8gb-heap-2618-files-merged.png, 
> successful-refactored-merge-utility-4gb-heap-2618-files-merged-setupTempFileOnly.png
>
>
> I have been running some tests trying to merge large amounts (2618) of small 
> pdf documents, between 100kb and 130kb, into a single large pdf (288.433kb)
> Memory consumption seems to be the main limitation.
> ScratchFileBuffer seems to consume the majority of the memory usage.
> (see screenshot from mat in attachment)
> (I would include the hprof in attachment so you can analyze yourselves but 
> it's rather large)
> Note that it seems impossible to generate a large pdf using a small memory 
> footprint.
> I personally thought that using MemorySettings with temporary file only would 
> allow me to generate arbitrarily large pdf files but it doesn't seem to help.
> I've run the mergeDocuments with  memory settings:
>  * MemoryUsageSetting.setupMixed(1024L * 1024L, 1024L * 1024L * 1024L * 1024L 
> * 1024L)
>  * MemoryUsageSetting.setupTempFileOnly()
> Refactored version completes with *4GB* heap:
> with temp file only completes 2618 documents in 1.760 min
> *VS*
> *8GB* heap:
> with temp file only completes 2618 documents in 2.0 min
> Heaps of 6gb or less result in OOM. (Didn't try different sizes between 6GB 
> and 8GB)
>  It looks like the loop in the mergeDocuments accumulates PDDocument objects 
> in a list which are closed after the merge is completed.
> Refactoring the code to close these as they are used, instead of accumulating 
> them and closing all at the end, improves memory usage considerably.(although 
> doesn't seem to be eliminated completed based on mat analysis.)
> Another change I've implemented is to only create the inputstream when the 
> file needs to be read and to close it alongside the PDDocument.
> (Some inputstreams contain buffers and depending on the size of the buffers 
> and or the stream type accumulating all the streams is a potential 
> memory-hog.)
> These changes seems to have a beneficial improvement in the sense that I can 
> process the same amount of pdfs with about half the memory.
>  I'd appreciate it if you could roll these changes into the main codebase.
> (I've respected java 6 compatibility.)
> I've included in attachment the java files of the new implementation:
>  * Suppliers
>  * Supplier
>  * PDFMergerUtilityUsingSupplier
> PDFMergerUtilityUsingSupplier can replace the previous version. No signature 
> changes only internal code changes. (just rename the class to 
> PDFMergerUtility if you decide to implemented the changes.)
>  In attachment you can also find some screenshots from visualvm showing the 
> memory usage of the original version and the refactored version as well as 
> some info produced by mat after analysing the heap.
> If you know of any other means, without running into memory issues, to merge 
> large sets of pdf files into a large single pdf I'd love to hear about it!
> I'd also suggest that there should be further improvements made in memory 
> usage in general as pdfbox seems to consumer a lot of memory in general.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4188) "Maximum allowed scratch file memory exceeded." Exception when merging large number of small PDFs

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438429#comment-16438429
 ] 

ASF subversion and git services commented on PDFBOX-4188:
-

Commit 1829154 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1829154 ]

PDFBOX-4182, PDFBOX-4188: add new merge mode which closes the source PDDocument 
after the individual merge; early implementation

>  "Maximum allowed scratch file memory exceeded." Exception when merging large 
> number of small PDFs
> --
>
> Key: PDFBOX-4188
> URL: https://issues.apache.org/jira/browse/PDFBOX-4188
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9, 3.0.0 PDFBox
>Reporter: Gary Potagal
>Priority: Major
> Attachments: PDFBOX-4188-MemoryManagerPatch.zip, 
> PDFBOX-4188-breakingTest.zip, PDFMergerUtility.java-20180412.patch
>
>
>  
> Am 06.04.2018 um 23:10 schrieb Gary Potagal:
>  
> We wanted to address one more merge issue in 
> org.apache.pdfbox.multipdf.PDFMergerUtility#mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting).
> We need to merge a large number of small files.  We use mixed mode, memory 
> and disk for cache.  Initially, we would often get "Maximum allowed scratch 
> file memory exceeded.", unless we turned off the check by passing "-1" to 
> org.apache.pdfbox.io.MemoryUsageSetting#MemoryUsageSetting.  I believe, this 
> is what the users that opened PDFBOX-3721 where running into.
> Our research indicates that the core issue with the memory model is that 
> instead of sharing a single cache, it breaks it up into equal sized fixed 
> partitions based on the number of input + output files being merged.  This 
> means that each partition must be big enough to hold the final output file.  
> When 400 1-page files are merged, this creates 401 partitions, but each of 
> which needs to be big enough to hold the final 400 pages.  Even worse, the 
> merge algorithm needs to keep all files open until the end.
> Given this, near the end of the merge, we're actually caching 400 x 1-page 
> input files, and 1 x 400-page output file, or 801 pages.
> However, with the partitioned cache, we need to declare room for 401  x 
> 400-pages, or 160,400 pages in total when specifying "maxStorageBytes".  This 
> would be a very high number, usually in GIGs.
>  
> Given the current limitation that we need to keep all the input files open 
> until the output file is written (HUGE), we came up with 2 options.  (See 
> PDFBOX-4182)  
>  
> 1.  Good: Split the cache in ½, give ½ to the output file, and segment the 
> other ½ across the input files. (Still keeping them open until then end).
> 2.  Better: Dynamically allocate in 16 page (64K) chunks from memory or disk 
> on demand, release cache as documents are closed after merge.  This is our 
> current implementation till PDFBOX-3999, PDFBOX-4003 and PDFBOX-4004 are 
> addressed.
>  
> We would like to submit our current implementation as a Patch to 2.0.10 and 
> 3.0.0, unless this is already addressed.
>  
>  Thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3809) PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all form fields instead of specified ones.

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438416#comment-16438416
 ] 

ASF subversion and git services commented on PDFBOX-3809:
-

Commit 1829151 from [~msahyoun] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1829151 ]

PDFBOX-3809: flatten only specified fields

> PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all 
> form fields instead of specified ones.
> 
>
> Key: PDFBOX-3809
> URL: https://issues.apache.org/jira/browse/PDFBOX-3809
> Project: PDFBox
>  Issue Type: Improvement
>  Components: AcroForm
>Affects Versions: 2.0.5, 2.0.6, 2.0.7
>Reporter: Cristin Donaher
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: Example of fields that need to enter and the calculated 
> field from those values.docx, sf270.pdf
>
>
> Thanks for the excellent PDF library.   For my use case I need to flatten a 
> subset of the AcroForm fields.  I was attempting to use the 
> PDAcroForm.flatten call, passing in my field list.  However, after the method 
> is called, all the fields are gone.  
> The method itself appears to remove all PDFAnnotationWidgets from each page 
> and at the end clears the acroform's field set.
> Is the javadoc description (This will flatten the specified form fields.) 
> just misleading?   Could a flatten call for a subset of fields be added?
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438411#comment-16438411
 ] 

Tilman Hausherr commented on PDFBOX-4189:
-

Re subsetting: in your call of 
{{{color:#24292e}PDType0Font{color}{color:#d73a49}.{color}{color:#24292e}load{color}}},
 set the last parameter to true or remove it, and see what happens. Subsetting 
means PDFBox creates a new font with only the glyphs that are really used, so 
generated files get smaller (for example, the Arial Uni font has a size of 
23MB!). Please have a look at {{PDAbstractContentStream.showTextInternal}}. 
This takes all codepoints and remembers which will be in the subset. I suspect 
that you'd need to know what actual codepoints are used after the substitutions.

Re {color:#33}GlyphsubstitutionTable{color}, yeah, just move it back, 
thanks.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread Palash Ray (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438401#comment-16438401
 ] 

Palash Ray commented on PDFBOX-4189:


* +{color:#33}What is the story of having different data for jdk7 and 
jdk8{color}+

Out of 323 entries for the GSUB table for the Bengali-Lohit.ttf font, I am 
getting a single entry which differs for jdk1.7 and 1.8. Thats the reason I had 
to create the 2 files. I am still investigating this, so maybe, I will come up 
with a better solution when I get to the bottom of this
 * +I'd also need to know where this file came from, or whether you created it 
yourself from other data+

Those .txt files are simple reference data used for testing the correctness of 
the GSUB tables. I have created them by putting some logic, transforming 
unicode characters into base-10 numbers.
 * +BengaliPdfGenerationHelloWorld should be integrated into the 
EmbeddedFonts.java example+

Will do
 * +why a log4j2.xml ? We don't use log4j2 except in preflight where log4j is 
used in Tests+

Agreed, I will remove the log4j2.xml
 * +You disabled subsetting+

I don't understand that yet. Please bear with me, I will make it work even with 
that. Let me take a look.
 * +The move of GlyphsubstitutionTable+

I can move it back if it simplifies things. Should I?
 * +There is a lot of logging done+

Will do
 * +Loosening scope restrictions is a bit of a no-no+

Agreed. I did this as a part of the move of GlyphsubstitutionTable, if I undo 
the move, this will be taken care of.

 
 * +Public methods should have a javadoc+

Will do

 

Thanks,

Palash.

 

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-4185) Fetching options for PDChoice causes ClassCastException

2018-04-14 Thread Maruan Sahyoun (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun resolved PDFBOX-4185.

Resolution: Fixed

resolved. Thanks [~matthias.gall] for the report and your tests.

> Fetching options for PDChoice causes ClassCastException 
> 
>
> Key: PDFBOX-4185
> URL: https://issues.apache.org/jira/browse/PDFBOX-4185
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.4, 2.0.9, 3.0.0 PDFBox
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.10, 3.0.0 PDFBox
>
>
> I am trying to fetch the options available for a PDChoice field in a form but 
> get a ClassCastException from the PDFBox internals.
> The problematic PDF is an Inheritance Tax form from the UK's Revenue and 
> Customs, specifically I am currently looking at IHT405:
> https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/697346/IHT405_online.pdf
> I use this code to iterate over the fields:
> {code}
>   PDDocument doc = PDDocument.load(resource.getFile());
>   PDDocumentCatalog catalog = doc.getDocumentCatalog();
>   PDAcroForm form = catalog.getAcroForm();
>   for (PDField field : form.getFields()) {
>   if ("Ch".equals(field.getFieldType())) {
>   PDChoice choice = (PDChoice) field;
>   // All these variants fail with a ClassCastException:
>   choice.getOptions();
>   choice.getOptionsDisplayValues();
>   choice.getOptionsExportValues(); // internally just 
> delegates to getOptions()
>   }
>   }
> {code}
> This is a stacktrace for e.g. the getOptionsExportValues() call:
> {noformat}
>   java.lang.ClassCastException: org.apache.pdfbox.cos.COSArray cannot be 
> cast to org.apache.pdfbox.cos.COSString
>   at 
> org.apache.pdfbox.pdmodel.common.COSArrayList.convertCOSStringCOSArrayToList(COSArrayList.java:367)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.FieldUtils.getPairableItems(FieldUtils.java:182)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDChoice.getOptions(PDChoice.java:91)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDChoice.getOptionsExportValues(PDChoice.java:210)
> {noformat}
> The problem is that the expected "stringArray" also contains COSArrays with 
> value and label for the options:
> {noformat}
>   COSArray{[COSString{ }, COSArray{[COSString{Mr}, COSString{MR}]}, 
> COSArray{[COSString{Mrs}, COSString{MRS}]}, COSArray{[COSString{Miss}, 
> COSString{MISS}]}, COSArray{[COSString{Ms}, COSString{MS}]}]}
> {noformat}
> This does not seem to be expected in FieldUtils.getPairableItems, which 
> introspects only the first item of the array and thus treats the array as an 
> array of strings.
> I found the bug with PDFBox 2.0.4 and upgraded to 2.0.9 which didn't help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438383#comment-16438383
 ] 

Tilman Hausherr commented on PDFBOX-4189:
-

Your patch is very much appreciated of course, thank you. It will probably 
result in thousands new users / usages. This is a complex patch so expect this 
to take some time before it is committed. See PDFBOX-4106 for an example of a 
complex patch and the discussion.

Can you please also add an apache header to the .txt files? See the file 
"pdfbox\src\main\resources\org\apache\pdfbox\resources\glyphlist\additional.txt".
 I'd also need to know where this file came from, or whether you created it 
yourself from other data; if yes, please include a comment how, and/or the code 
that created the file.

About the commits:
 - {color:#33}What is the story of having different data for jdk7 and 
jdk8?{color}
 - BengaliPdfGenerationHelloWorld should be integrated into the 
EmbeddedFonts.java example
 - why a log4j2.xml ? We don't use log4j2 except in preflight where log4j is 
used in Tests
 - I think I understand why my example didn't work. You disabled subsetting. 
But with subsetting the subsetter should "know" which glyphs are used. But we 
do need subsetting because otherwise files might get huge
 - The generated PDF file has trouble with text extraction: "আমি কোন পথƶ §ীরƶর 
ল©ী ষĞ পুতুল Šপো গÄা ঋষি" i.e. there are some unknown glyphs.
 - The move of GlyphsubstitutionTable breaks the API. Like I said in the PR, if 
you keep the API as it is (only expand. not change existing methods) then your 
change could be used for 2.0 too. The release of 3.0 could take years. The 
release of 2.0.10 only a few months.
 - There is a lot of logging done ("WARNUNG: oldValue: [52, 114] will be 
overridden with newValue: [114, 52]"). This is scary and should be changed or 
removed, It scares users and they create issues, thinking that something got 
wrong. If you change it to debug, please include a comment what this is about. 
See also the discussion in PDFBOX-4106, about{color:#33} "Trying to 
un-substitute a never-before-seen gid"{color}.
 - Loosening scope restrictions is a bit of a no-no, as done in 
[TTFDataStream.java|https://github.com/apache/pdfbox/pull/46/files#diff-894ae790d373c62634ceed941b264dc3]
 , 
[TTFTable.java|https://github.com/apache/pdfbox/pull/46/files#diff-355fd8e3330f392bdae0778f942dc124]
 , and maybe elsewhere. As preached by "Effective Java", item 15: "make each 
class or member as inaccessible as possible".
 - Public methods should have a javadoc, same for classes. It doesn't have to 
be big, just make it good enough for other people to understand what is done. 
See also [https://pdfbox.apache.org/codingconventions.html] , I think most 
conventions are already respected.

I have no yet done a review review of the code (looking side-by-side), so more 
questions may be coming.

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
>  Issue Type: New Feature
>  Components: FontBox, PDModel
>Reporter: Palash Ray
>Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table

2018-04-14 Thread Palash Ray (JIRA)
Palash Ray created PDFBOX-4189:
--

 Summary: Enable rendering of Indian languages, by reading and 
utilizing the GSUB table
 Key: PDFBOX-4189
 URL: https://issues.apache.org/jira/browse/PDFBOX-4189
 Project: PDFBox
  Issue Type: New Feature
  Components: FontBox, PDModel
Reporter: Palash Ray
 Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf

Implemented proper rendering of Indian languages, which need extensive Glyph 
substitution. The GSUB table has been read and used effectively to replace some 
compound words with their respective Glyphs. All tests are passing. I have 
tested this for the Bengali font. Please review these changes and let me know 
if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4185) Fetching options for PDChoice causes ClassCastException

2018-04-14 Thread Matthias Gall (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438351#comment-16438351
 ] 

Matthias Gall commented on PDFBOX-4185:
---

[~msahyoun] works with 2.0.10-SNAPSHOT, thanks!

> Fetching options for PDChoice causes ClassCastException 
> 
>
> Key: PDFBOX-4185
> URL: https://issues.apache.org/jira/browse/PDFBOX-4185
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.4, 2.0.9, 3.0.0 PDFBox
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.10, 3.0.0 PDFBox
>
>
> I am trying to fetch the options available for a PDChoice field in a form but 
> get a ClassCastException from the PDFBox internals.
> The problematic PDF is an Inheritance Tax form from the UK's Revenue and 
> Customs, specifically I am currently looking at IHT405:
> https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/697346/IHT405_online.pdf
> I use this code to iterate over the fields:
> {code}
>   PDDocument doc = PDDocument.load(resource.getFile());
>   PDDocumentCatalog catalog = doc.getDocumentCatalog();
>   PDAcroForm form = catalog.getAcroForm();
>   for (PDField field : form.getFields()) {
>   if ("Ch".equals(field.getFieldType())) {
>   PDChoice choice = (PDChoice) field;
>   // All these variants fail with a ClassCastException:
>   choice.getOptions();
>   choice.getOptionsDisplayValues();
>   choice.getOptionsExportValues(); // internally just 
> delegates to getOptions()
>   }
>   }
> {code}
> This is a stacktrace for e.g. the getOptionsExportValues() call:
> {noformat}
>   java.lang.ClassCastException: org.apache.pdfbox.cos.COSArray cannot be 
> cast to org.apache.pdfbox.cos.COSString
>   at 
> org.apache.pdfbox.pdmodel.common.COSArrayList.convertCOSStringCOSArrayToList(COSArrayList.java:367)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.FieldUtils.getPairableItems(FieldUtils.java:182)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDChoice.getOptions(PDChoice.java:91)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDChoice.getOptionsExportValues(PDChoice.java:210)
> {noformat}
> The problem is that the expected "stringArray" also contains COSArrays with 
> value and label for the options:
> {noformat}
>   COSArray{[COSString{ }, COSArray{[COSString{Mr}, COSString{MR}]}, 
> COSArray{[COSString{Mrs}, COSString{MRS}]}, COSArray{[COSString{Miss}, 
> COSString{MISS}]}, COSArray{[COSString{Ms}, COSString{MS}]}]}
> {noformat}
> This does not seem to be expected in FieldUtils.getPairableItems, which 
> introspects only the first item of the array and thus treats the array as an 
> array of strings.
> I found the bug with PDFBox 2.0.4 and upgraded to 2.0.9 which didn't help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Build failed in Jenkins: PDFBox-trunk #3968

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-3809: return early for empty field list; remove rendering 
test for flatten of specific fields

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H25 (ubuntu xenial) in workspace 

Cleaning up 
Deleting 
Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision 
'2018-04-14T13:58:10.442 +'
U 
pdfbox/src/test/java/org/apache/pdfbox/pdmodel/interactive/form/PDAcroFormTest.java
U 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/form/PDAcroForm.java
At revision 1829139

Parsing POMs
Established TCP socket on 32963
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[trunk] $ /home/jenkins/tools/java/jdk1.8.0_66-unlimited-security/bin/java 
-Xmx1g -XX:MaxPermSize=300m -cp 
/home/jenkins/jenkins-slave/maven35-agent.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-maven-3.5.0/conf/logging
 jenkins.maven3.agent.Maven35Main /home/jenkins/tools/maven/apache-maven-3.5.0 
/home/jenkins/jenkins-slave/slave.jar 
/home/jenkins/jenkins-slave/maven35-interceptor.jar 
/home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 32963
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
' for files 
matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #3966
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: 

Build failed in Jenkins: PDFBox-trunk » PDFBox parent #3968

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 32963
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-trunk/trunk/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #3966
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[ERROR] IO Exception: connect timed out
[WARNING] Unable to download the NVD CVE data; the results may not include the 
most recent CPE/CVEs from the NVD.
[INFO] If you are behind a proxy you may need to configure dependency-check to 
use the proxy.
[WARNING] Unable to update Cached Web DataSource, using local data instead. 
Results may not include recent vulnerabilities.
[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] Finished Dependency Merging Analyzer (0 seconds)
[INFO] Finished Version Filter Analyzer (0 seconds)
[INFO] Finished Hint Analyzer (0 seconds)
[INFO] Created CPE Index (1 seconds)
[INFO] Skipping CPE Analysis for npm
[INFO] 

[jira] [Commented] (PDFBOX-3809) PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all form fields instead of specified ones.

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438347#comment-16438347
 ] 

ASF subversion and git services commented on PDFBOX-3809:
-

Commit 1829139 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1829139 ]

PDFBOX-3809: return early for empty field list; remove rendering test for 
flatten of specific fields

> PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all 
> form fields instead of specified ones.
> 
>
> Key: PDFBOX-3809
> URL: https://issues.apache.org/jira/browse/PDFBOX-3809
> Project: PDFBox
>  Issue Type: Improvement
>  Components: AcroForm
>Affects Versions: 2.0.5, 2.0.6, 2.0.7
>Reporter: Cristin Donaher
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: Example of fields that need to enter and the calculated 
> field from those values.docx, sf270.pdf
>
>
> Thanks for the excellent PDF library.   For my use case I need to flatten a 
> subset of the AcroForm fields.  I was attempting to use the 
> PDAcroForm.flatten call, passing in my field list.  However, after the method 
> is called, all the fields are gone.  
> The method itself appears to remove all PDFAnnotationWidgets from each page 
> and at the end clears the acroform's field set.
> Is the javadoc description (This will flatten the specified form fields.) 
> just misleading?   Could a flatten call for a subset of fields be added?
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3809) PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all form fields instead of specified ones.

2018-04-14 Thread Maruan Sahyoun (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-3809:
---
Fix Version/s: 3.0.0 PDFBox
   2.0.10

> PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all 
> form fields instead of specified ones.
> 
>
> Key: PDFBOX-3809
> URL: https://issues.apache.org/jira/browse/PDFBOX-3809
> Project: PDFBox
>  Issue Type: Improvement
>  Components: AcroForm
>Affects Versions: 2.0.5, 2.0.6, 2.0.7
>Reporter: Cristin Donaher
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: Example of fields that need to enter and the calculated 
> field from those values.docx, sf270.pdf
>
>
> Thanks for the excellent PDF library.   For my use case I need to flatten a 
> subset of the AcroForm fields.  I was attempting to use the 
> PDAcroForm.flatten call, passing in my field list.  However, after the method 
> is called, all the fields are gone.  
> The method itself appears to remove all PDFAnnotationWidgets from each page 
> and at the end clears the acroform's field set.
> Is the javadoc description (This will flatten the specified form fields.) 
> just misleading?   Could a flatten call for a subset of fields be added?
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Build failed in Jenkins: PDFBox-trunk » PDFBox parent #3967

2018-04-14 Thread Apache Jenkins Server
See 


--
Established TCP socket on 39378
maven35-agent.jar already up to date
maven35-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=300m; 
support was removed in 8.0
<===[JENKINS REMOTING CAPACITY]===>   channel started
Executing Maven:  -B -f 
/home/jenkins/jenkins-slave/workspace/PDFBox-trunk/trunk/pom.xml 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/maven-repositories/0 clean 
deploy -Ppedantic -Dskip-bavaria=false
[INFO] Scanning for projects...
[WARNING] The project org.apache.pdfbox:pdfbox-parent:pom:3.0.0-SNAPSHOT uses 
prerequisites which is only intended for maven-plugin projects but not for non 
maven-plugin projects. For such purposes you should use the 
maven-enforcer-plugin. See 
https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] PDFBox parent
[INFO] Apache FontBox
[INFO] Apache XmpBox
[INFO] Apache PDFBox
[INFO] Apache Preflight
[INFO] Apache Preflight application
[INFO] Apache PDFBox Debugger
[INFO] Apache PDFBox tools
[INFO] Apache PDFBox application
[INFO] Apache PDFBox Debugger application
[INFO] Apache PDFBox examples
[INFO] Apache PDFBox
[INFO] 
[INFO] 
[INFO] Building PDFBox parent 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ pdfbox-parent ---
[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #3966
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox-parent ---
[INFO] 
[INFO] --- animal-sniffer-maven-plugin:1.15:check (check-java-version) @ 
pdfbox-parent ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java17:1.0
[INFO] 
[INFO] --- maven-site-plugin:3.7:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
[INFO] No site descriptor found: nothing to attach.
[INFO] 
[INFO] >>> maven-source-plugin:3.0.1:jar (attach-sources) > generate-sources @ 
pdfbox-parent >>>
[INFO] 
[INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce-maven-version) @ 
pdfbox-parent ---
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
pdfbox-parent <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ pdfbox-parent ---
[INFO] 
[INFO] --- apache-rat-plugin:0.12:check (default) @ pdfbox-parent ---
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 61 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 1 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 
0, approved: 1 licenses.
[INFO] 
[INFO] --- dependency-check-maven:3.1.2:check (default) @ pdfbox-parent ---
[INFO] Checking for updates
[INFO] starting getUpdatesNeeded() ...
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[WARNING] Unable to download the NVD CVE data; the results may not include the 
most recent CPE/CVEs from the NVD.
[INFO] If you are behind a proxy you may need to configure dependency-check to 
use the proxy.
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[ERROR] Unable to resolve domain 'nvd.nist.gov'
[WARNING] Unable to update Cached Web DataSource, using local data instead. 
Results may not include recent vulnerabilities.
[INFO] Analysis Started
[INFO] Finished File Name Analyzer (0 seconds)
[INFO] 

Build failed in Jenkins: PDFBox-trunk #3967

2018-04-14 Thread Apache Jenkins Server
See 


Changes:

[msahyoun] PDFBOX-3809: add missing test for hasMissingPageRef before LOG 
statement

--
[...truncated 156.52 KB...]
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/PrintURLs.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractTTFFonts.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/EmbeddedFonts.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/AddAnnotations.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/CreateLandscapePDF.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/CreateGradientShadingPDF.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/HelloWorld.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractMetadata.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/AddMessageToEachPage.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ImageToPDF.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/AddJavascript.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/UsingTextMatrix.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ReplaceURLs.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/RubberStamp.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/PrintDocumentMetaData.java
A 
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/CreatePDFA.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/RemoveFirstPage.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ShowTextWithPositioning.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/PrintBookmarks.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/CreateBlankPDF.java
A 
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/SuperimposePage.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/GoToSecondBookmarkOnOpen.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/EmbeddedFiles.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/CreateBookmarks.java
AUexamples/src/main/java/org/apache/pdfbox/examples/pdmodel/package.html
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/CreatePatternsPDF.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/AddMetadataFromDocInfo.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ShowColorBoxes.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/RubberStampWithImage.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/AddImageToPDF.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/HelloWorldTTF.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/EmbeddedVerticalFonts.java
AU
examples/src/main/java/org/apache/pdfbox/examples/pdmodel/HelloWorldType1.java
A examples/src/main/java/org/apache/pdfbox/examples/printing
AU
examples/src/main/java/org/apache/pdfbox/examples/printing/Printing.java
A examples/src/main/java/org/apache/pdfbox/examples/lucene
AUexamples/src/main/java/org/apache/pdfbox/examples/lucene/package.html
AU
examples/src/main/java/org/apache/pdfbox/examples/lucene/IndexPDFFiles.java
AU
examples/src/main/java/org/apache/pdfbox/examples/lucene/LucenePDFDocument.java
A examples/src/main/java/org/apache/pdfbox/examples/interactive
A examples/src/main/java/org/apache/pdfbox/examples/interactive/form
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/DetermineTextFitsField.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/AddBorderToField.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/CreateSimpleFormWithEmbeddedFont.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/FillFormField.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/SetField.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/CreateSimpleForm.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/PrintFields.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/UpdateFieldOnDocumentOpen.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/package-info.java
AU
examples/src/main/java/org/apache/pdfbox/examples/interactive/form/CreateMultiWidgetsForm.java
AU

[jira] [Commented] (PDFBOX-3809) PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all form fields instead of specified ones.

2018-04-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438326#comment-16438326
 ] 

ASF subversion and git services commented on PDFBOX-3809:
-

Commit 1829135 from [~msahyoun] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1829135 ]

PDFBOX-3809: add missing test for hasMissingPageRef before LOG statement

> PDAcroForm.flatten(PDField list, refreshAppearances boolean) flattens all 
> form fields instead of specified ones.
> 
>
> Key: PDFBOX-3809
> URL: https://issues.apache.org/jira/browse/PDFBOX-3809
> Project: PDFBox
>  Issue Type: Improvement
>  Components: AcroForm
>Affects Versions: 2.0.5, 2.0.6, 2.0.7
>Reporter: Cristin Donaher
>Assignee: Maruan Sahyoun
>Priority: Minor
> Attachments: Example of fields that need to enter and the calculated 
> field from those values.docx, sf270.pdf
>
>
> Thanks for the excellent PDF library.   For my use case I need to flatten a 
> subset of the AcroForm fields.  I was attempting to use the 
> PDAcroForm.flatten call, passing in my field list.  However, after the method 
> is called, all the fields are gone.  
> The method itself appears to remove all PDFAnnotationWidgets from each page 
> and at the end clears the acroform's field set.
> Is the javadoc description (This will flatten the specified form fields.) 
> just misleading?   Could a flatten call for a subset of fields be added?
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org