Re: Help with NullPointerException org.apache.io.IOUtils.LOG

2024-03-15 Thread Tilman Hausherr
Searching for the error message I found this in a comment: https://stackoverflow.com/questions/69151291/java-16-modularisation-illegalaccessexception-java-nio-spring-boot |--add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/jdk.internal.ref=ALL-UNNAMED| Tilman On 15.03.2024

RE: Help with NullPointerException org.apache.io.IOUtils.LOG

2024-03-15 Thread Matthew Hardy
Hi Andreas, I've upgraded to pdfbox 3.0.2, I'm no longer getting the ExceptionInilizationError when instantiating an empty PDDocument. However, I'm now receiving this error message- ERROR [org.apache.pdfbox.io.IOUtils] (EE-ManagedExecutorService-default-Thread-1) Unmapping is not supported.:

Re: AFMParser optimization

2024-03-15 Thread Tilman Hausherr
Hi, Thank you, done. Tilman On 15.03.2024 14:49, Guillaume Maillrd wrote: Hi, During a profiling session of my application, I found something that could interest you. To speedup the AFMParser (50% gain), the "equals" in parseCharMetric should be written in this order ( order of top 5

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-15 Thread Tilman Hausherr
Yes identity does work for that file. However using that logic fails to provide the correct results for other files with an unusuable /ToUnicode stream. Yes there can be larger blocks. My suspicion is that the tools who use "identity" for your file will fail for some of the files. Unless we

AFMParser optimization

2024-03-15 Thread Guillaume Maillrd
Hi, During a profiling session of my application, I found something that could interest you. To speedup the AFMParser (50% gain), the "equals" in parseCharMetric should be written in this order ( order of top 5 usage) : if (nextCommand.equals(CHARMETRICS_C)) { ... } else if

Re: Bugfix for FileSystemFontProvider

2024-03-15 Thread Guillaume Maillrd
Hi, Thanks, sorry for this duplicate. I hope 2.0.31 will be released soon. Regards, Guillaume Le 15/03/2024 à 13:51, Tilman Hausherr a écrit : Hi, Yeah, "never happens" is a red flag. That part has been changed to use CRC32:

Re: Type 0 font - Text extraction X PDF Debugger

2024-03-15 Thread Luiz Marcelo Modesto
Thank you Tilman! I'll try to read ISO 32000-2:2020 again to look for some kind of precedence rules regarding the way of decoding string codes to Unicode chars. My impression is that there are some choices but I don't remember if there is something assertive or not. Maybe it could be just an

Re: Bugfix for FileSystemFontProvider

2024-03-15 Thread Tilman Hausherr
Hi, Yeah, "never happens" is a red flag. That part has been changed to use CRC32: https://svn.apache.org/viewvc/pdfbox/branches/2.0/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1916176=markup#l923 https://issues.apache.org/jira/browse/PDFBOX-5727

Bugfix for FileSystemFontProvider

2024-03-15 Thread Guillaume Maillrd
Hi, In version 2.0.30, a typo in computeHash from FileSystemFontProvider makes all hash to return "". It breaks the cache logic, resulting a very slow loadDiskCache. Please replace "SHA512" by "SHA-512" or backport the v3 code to use CRC32. The "// never happens" comment looks funny. Best