[jira] [Commented] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764358#comment-17764358
 ] 

Maruan Sahyoun commented on PDFBOX-5682:


we could add a benchmark to the benchmark package with the sample so track 
optimizations and also help find regressions later on. WDYT?

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764356#comment-17764356
 ] 

Andreas Lehmkühler commented on PDFBOX-5682:


[~tallison] Thanks for the explanation. That is suboptimal ... in the end one 
has to dereference all indirect objects to collect all possible occurrences, 
e.g. the first mentioned pdf contains 100k indirect objects and it took some 
time to dereference them all. I'll see if there is any chance to optimize the 
process

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764228#comment-17764228
 ] 

Tim Allison edited comment on PDFBOX-5682 at 9/12/23 2:41 PM:
--

This is the part from that document that is, erm, eye-opening:

{noformat}
4.2 AF entry not in the catalog
4.2.1 General
Most existing applications that take advantage of Associated Files use the AF 
entry in the
document catalog as the place to make the association. However, the concept of
Associated Files goes well beyond association only with the file as a whole, 
and also
allows for defining relations between embedded files and certain pages, 
annotations,
form fields, graphics objects, structure elements in the tagging structure, 
DParts or any
other PDF object.
{noformat}

And, yes, the document goes on to say, PDF writers should do the traditional 
thing, but...



was (Author: talli...@mitre.org):
This is the part from that document that is, erm, eye-opening:

{noformat}
4.2 AF entry not in the catalog
4.2.1 General
Most existing applications that take advantage of Associated Files use the AF 
entry in the
document catalog as the place to make the association. However, the concept of
Associated Files goes well beyond association only with the file as a whole, 
and also
allows for defining relations between embedded files and certain pages, 
annotations,
form fields, graphics objects, structure elements in the tagging structure, 
DParts or any
other PDF object.
{noformat}

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764225#comment-17764225
 ] 

Tim Allison edited comment on PDFBOX-5682 at 9/12/23 2:41 PM:
--

Thank you, [~lehmi].  In Tika, we initially copied PDFBox's 
ExtractEmbeddedFiles example, but we found that PDF writers can stuff attached 
files/file specs/associated files on pretty much anything 
(https://www.pdfa.org/wp-content/uploads/2018/10/PDF20_AN002-AF.pdf) . 

>From what we can tell with publicly available corpora, it is rare to have an 
>attachment not in the name tree and not in an annotation on a page, but after 
>making the change in TIKA-4012, we did find a few new attachments.

This may be a "won't fix" in 3.x. 

Perhaps we allow users to turn off the "scan every object for an embedded file" 
on the Tika side?


was (Author: talli...@mitre.org):
Thank you, [~lehmi].  In Tika, we initially copied PDFBox's 
ExtractEmbeddedFiles example, but we found that PDF writers can stuff attached 
files/file specs/associated files on pretty much anything 
(https://www.pdfa.org/wp-content/uploads/2018/10/PDF20_AN002-AF.pdf) . 

>From what we can tell with publicly available corpora, it is rare to have an 
>attachment not in the name tree and not in an annotation on a page, but after 
>making the change in TIKA-4012, we did find a few new attachments.

This may be a "won't fix" in 3.x. 

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764228#comment-17764228
 ] 

Tim Allison commented on PDFBOX-5682:
-

This is the part from that document that is, erm, eye-opening:

{noformat}
4.2 AF entry not in the catalog
4.2.1 General
Most existing applications that take advantage of Associated Files use the AF 
entry in the
document catalog as the place to make the association. However, the concept of
Associated Files goes well beyond association only with the file as a whole, 
and also
allows for defining relations between embedded files and certain pages, 
annotations,
form fields, graphics objects, structure elements in the tagging structure, 
DParts or any
other PDF object.
{noformat}

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764225#comment-17764225
 ] 

Tim Allison commented on PDFBOX-5682:
-

Thank you, [~lehmi].  In Tika, we initially copied PDFBox's 
ExtractEmbeddedFiles example, but we found that PDF writers can stuff attached 
files/file specs/associated files on pretty much anything 
(https://www.pdfa.org/wp-content/uploads/2018/10/PDF20_AN002-AF.pdf) . 

>From what we can tell with publicly available corpora, it is rare to have an 
>attachment not in the name tree and not in an annotation on a page, but after 
>making the change in TIKA-4012, we did find a few new attachments.

This may be a "won't fix" in 3.x. 

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764199#comment-17764199
 ] 

Andreas Lehmkühler edited comment on PDFBOX-5682 at 9/12/23 1:56 PM:
-

{quote}It looks like that causes a full parse of the file?{quote}
"getObjectsByType" searches for all indirect objects of the type FILESPEC so 
that all indirect objects have to be loaded on demand which is more or less the 
whole file. In 2.0.x all objects are already loaded and therefore calling 
"getObjectsByType" is less performance consuming compared to 3.0.x.

IMHO there are two possible solutions:
* maybe there is some room for improvements when loading of all objects
* don't scan all objects when looking for some special object types like files. 
The example "org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles" shows how 
to get all files using PD-level objects. In 3.0.x this should be the preferred 
way to go as it doesn't scan all indirect objects


was (Author: lehmi):
{quote}It looks like that causes a full parse of the file?{quote}
"getObjectsByType" searches for all indirect objects of the type FILESPEC so 
that all indirect objects have to be loaded on demand which is more or less the 
whole file. In 2.0.x all objects are already loaded and therefore calling 
"getObjectsByType" is less performance consuming compared to 3.0.x.

IMHO there are two possible solutions:
* maybe there some room for improvements when loading of all objects
* don't scan all objects when looking for some special object types like files. 
The example "org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles" shows how 
to get all files using PD-level objects. In 3.0.x this should be the preferred 
way to go as it doesn't scan all indirect objects

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5682) Long/permanent hang in PDFBox 3.x

2023-09-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764199#comment-17764199
 ] 

Andreas Lehmkühler commented on PDFBOX-5682:


{quote}It looks like that causes a full parse of the file?{quote}
"getObjectsByType" searches for all indirect objects of the type FILESPEC so 
that all indirect objects have to be loaded on demand which is more or less the 
whole file. In 2.0.x all objects are already loaded and therefore calling 
"getObjectsByType" is less performance consuming compared to 3.0.x.

IMHO there are two possible solutions:
* maybe there some room for improvements when loading of all objects
* don't scan all objects when looking for some special object types like files. 
The example "org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles" shows how 
to get all files using PD-level objects. In 3.0.x this should be the preferred 
way to go as it doesn't scan all indirect objects

> Long/permanent hang in PDFBox 3.x
> -
>
> Key: PDFBOX-5682
> URL: https://issues.apache.org/jira/browse/PDFBOX-5682
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5654) Avoid NPE when processing CFF2 based fonts

2023-09-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5654:
---
Fix Version/s: 2.0.30

> Avoid NPE when processing CFF2 based fonts
> --
>
> Key: PDFBOX-5654
> URL: https://issues.apache.org/jira/browse/PDFBOX-5654
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.0 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.30, 3.0.1 PDFBox, 4.0.0
>
>
> PDFBOX-5631 adds rendering support for CFF-based fonts. That support is 
> limited to CFF based fonts. The newer CFF2 format isn't supported which may 
> lead to a NPE while loading local fonts when creating the font cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5677) FileSystemFontProvider::scanFonts fail because OpenTypeFont::getCFF returns null unexpectedly

2023-09-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-5677.
--
Resolution: Duplicate

[~tommix] thanks for the report.

This is a duplicate of PDFBOX-5654

> FileSystemFontProvider::scanFonts fail because OpenTypeFont::getCFF returns 
> null unexpectedly
> -
>
> Key: PDFBOX-5677
> URL: https://issues.apache.org/jira/browse/PDFBOX-5677
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.29, 3.0.0 PDFBox
> Environment: Server version name:   Apache Tomcat/9.0.80
> OS Name:   Linux
> OS Version:6.5.0-1-MANJARO
> Architecture:  amd64
> Java Home: /usr/lib/jvm/zulu-17
> JVM Version:   17.0.8+7-LTS
> JVM Vendor:Azul Systems, Inc.
>Reporter: Tamas Rasztik
>Assignee: Andreas Lehmkühler
>Priority: Major
> Attachments: 2023-09-08_error.png
>
>
> Under Linux there could be fonts (e.g Cantarell-VF.otf) where 
> OpenTypeFont::isPostScropt returns true, however OpenTypeFont::getCFF returns 
> null therefore a NullPointerException is thrown and FileSystemFontProvider 
> initialisation fails.
> These fonts could be ignored by the FileSystemProvider.
> {code:java}
> 08-Sep-2023 10:10:09.633 SEVERE [RMI TCP Connection(2)-127.0.0.1] 
> org.apache.catalina.core.StandardContext.listenerStart Exception sending 
> context initialized event to listener instance of class 
> [org.springframework.web.context.ContextLoaderListener]
>     org.springframework.beans.factory.BeanCreationException: Error creating 
> bean with name 'smartformAction' defined in BeanDefinition defined in 
> ServletContext resource [/WEB-INF/classes/applicationContext_Default.xml]: 
> Initialization of bean failed; nested exception is 
> org.springframework.aop.framework.AopConfigException: Unexpected AOP 
> exception; nested exception is java.lang.IllegalStateException: Unable to 
> load cache item
>         at 
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:628)
>         at 
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
>         at 
> org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
>         at 
> org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
>         at 
> org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
>         at 
> org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
>         at 
> org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:936)
>         at 
> org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:921)
>         at 
> org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)
>         at 
> org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:399)
>         at 
> org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:278)
>         at 
> org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:103)
>         at 
> org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4462)
>         at 
> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:4914)
>         at 
> org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:171)
>         at 
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:683)
>         at 
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:658)
>         at 
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:662)
>         at 
> org.apache.catalina.startup.HostConfig.manageApp(HostConfig.java:1782)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>         at 
> org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:294)
>         at 
> 

[jira] [Commented] (PDFBOX-5654) Avoid NPE when processing CFF2 based fonts

2023-09-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764144#comment-17764144
 ] 

ASF subversion and git services commented on PDFBOX-5654:
-

Commit 1912264 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1912264 ]

PDFBOX-5654: avoid NPE when processing CCF2 fonts

> Avoid NPE when processing CFF2 based fonts
> --
>
> Key: PDFBOX-5654
> URL: https://issues.apache.org/jira/browse/PDFBOX-5654
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.0 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.1 PDFBox, 4.0.0
>
>
> PDFBOX-5631 adds rendering support for CFF-based fonts. That support is 
> limited to CFF based fonts. The newer CFF2 format isn't supported which may 
> lead to a NPE while loading local fonts when creating the font cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3330) Enhance and update PDFBox website & documentation

2023-09-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764025#comment-17764025
 ] 

ASF subversion and git services commented on PDFBOX-3330:
-

Commit d879f889285d35c8ec24f017d287020889007702 in pdfbox-docs's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=d879f889 ]

PDFBOX-3330: improve support page

> Enhance and update PDFBox website & documentation
> -
>
> Key: PDFBOX-3330
> URL: https://issues.apache.org/jira/browse/PDFBOX-3330
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Priority: Major
> Attachments: Bildschirmfoto von »2018-03-14 22-59-10«.png, 
> Bildschirmfoto von »2018-03-14 22-59-21«.png, PDFBox.Logo-0.1.0.png, 
> pdfbox-topbar.pdf, screenshot-1.png, toolbox.svg, topbar.png
>
>
> General purpose ticket to track enhancements to the website and documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5678) Getting started page talks about LittleCMS perf issue, but it's fixed in Java 18

2023-09-12 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764021#comment-17764021
 ] 

Tilman Hausherr commented on PDFBOX-5678:
-

I'll wait a few days if there's some disagreement and then I'll do it.

> Getting started page talks about LittleCMS perf issue, but it's fixed in Java 
> 18
> 
>
> Key: PDFBOX-5678
> URL: https://issues.apache.org/jira/browse/PDFBOX-5678
> Project: PDFBox
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Mike Hearn
>Priority: Major
>
> [https://pdfbox.apache.org/3.0/getting-started.html]
> This page contains possibly obsolete advice: "Due to the change of the java 
> color management module towards "LittleCMS", users can experience slow 
> performance in color operations"
> The bug that's linked to as explanation was marked as fixed in Java 18. It 
> turned out that LittleCMS wasn't ever the problem really, but rather 
> inefficient use of JNI:
> [https://github.com/openjdk/jdk/pull/5835]
> So hopefully this advice can now be gated on "If you are using lower than 
> Java 18"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5681) ConcurrentModificationException in getObjectsByType() in 3.x

2023-09-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5681:
--

Assignee: Andreas Lehmkühler

> ConcurrentModificationException in getObjectsByType() in 3.x
> 
>
> Key: PDFBOX-5681
> URL: https://issues.apache.org/jira/browse/PDFBOX-5681
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Tim Allison
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Attachments: PDFBOX-3714-2.pdf
>
>
> [~tilman]'s regression testing turned up this exception when we integrate 
> PDFBox 3.0.0 into Tika:
> {noformat}
> java.util.ConcurrentModificationException
>   at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
>   at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1620)
>   at 
> org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:254)
>   at 
> org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:240)
> {noformat}
> I can replicate this exception consistently on the attached file.
> With this code:
> {noformat}
> Path path = Paths.get("/.../PDFBOX-3714-2.pdf");
> PDDocument document = Loader.loadPDF(path.toFile());
> List objs = 
> document.getDocument().getObjectsByType(COSName.FILESPEC);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org