[jira] [Commented] (PDFBOX-4549) No Unicode mapping

2020-01-08 Thread Sergey Makarov (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011508#comment-17011508
 ] 

Sergey Makarov commented on PDFBOX-4549:


[~Giorgy], please share file example, It's important for project, where we are 
using pdfbox

> No Unicode mapping
> --
>
> Key: PDFBOX-4549
> URL: https://issues.apache.org/jira/browse/PDFBOX-4549
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.15
>Reporter: Sergey Makarov
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.16, 3.0.0 PDFBox
>
> Attachments: XO_Thames.zip, our_star_wars.pdf
>
>
> Hello, if i try get text from pdf (attached), i will result empty out and 
> many warns. Font attached also.
>  Acrobat reader will open succeed, I can select, copy text and save as text
> my code:
> {code:java}
> private static void parseOne(String path) throws IOException {
> String pdfFileInText;
> PDFTextStripper tStripper;
> File file = new File(path);
> tStripper = new PDFTextStripper();
> MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 
> 5).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
> PDDocument document = PDDocument.load(file, memUsageSetting);
> if (!document.isEncrypted()) {
> pdfFileInText = tStripper.getText(document);
> System.out.print(pdfFileInText);
> }
> document.close();
> }{code}
> Error:
> {code:java}
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-4736) java.io.IOException: Error: End-of-File, expected line

2020-01-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-4736.
--
Resolution: Invalid

I've downloaded the pdf manually and it can be rendered without any problems 
using PDFBox.

The website requires javascript so that it won't work to simply open a 
http-connection to download the file. This is not an issue with PDFBox but with 
your code or better your expectation. You have to download the file manually or 
use an URL for a direct download.


> java.io.IOException: Error: End-of-File, expected line
> --
>
> Key: PDFBOX-4736
> URL: https://issues.apache.org/jira/browse/PDFBOX-4736
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.18
> Environment: Windows
>Reporter: Akos Kovacs
>Assignee: Andreas Lehmkühler
>Priority: Major
>
> I try to read PDF file from a given URL, but I got following error message:
> {code:java}
> Exception in thread "main" java.io.IOException: Error: End-of-File, expected 
> lineException in thread "main" java.io.IOException: Error: End-of-File, 
> expected line at 
> org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1124) at 
> org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2595) at 
> org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2574) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122) at 
> ScreenshotFromPdf.Pdf2Image(ScreenshotFromPdf.java:19) at 
> ScreenshotFromPdf.main(ScreenshotFromPdf.java:33){code}
> Example pdf file: [http://aplaidshirt.epizy.com/samplePDF.pdf]
> Code:
> {code:java}
> public class ScreenshotFromPdf {
>  public static void Pdf2Image(String html) throws IOException, 
> InterruptedException {
>  Thread.sleep(5000);
>  URL url=new URL(html);
>  HttpURLConnection connection=(HttpURLConnection)url.openConnection();
>  InputStream is=connection.getInputStream();
>  PDDocument document = PDDocument.load(is);
>  PDFRenderer pdfRenderer = new PDFRenderer(document);
>  for (int page = 0; page < document.getNumberOfPages(); ++page) {
>  BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
>  File outputFile = new File("C:\\_privat\\pdftest\\" + page + "image.jpg");
>  System.out.println(outputFile.toString());
>  ImageIO.write(bim, "jpg", outputFile);
>  }
>  document.close();
>  }
>  public static void main(String[] args) throws IOException, 
> InterruptedException {
>  String url = "http://aplaidshirt.epizy.com/samplePDF.pdf;;
>  ScreenshotFromPdf.Pdf2Image(url);
>  }
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-4736) java.io.IOException: Error: End-of-File, expected line

2020-01-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-4736:
--

Assignee: Andreas Lehmkühler

> java.io.IOException: Error: End-of-File, expected line
> --
>
> Key: PDFBOX-4736
> URL: https://issues.apache.org/jira/browse/PDFBOX-4736
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.18
> Environment: Windows
>Reporter: Akos Kovacs
>Assignee: Andreas Lehmkühler
>Priority: Major
>
> I try to read PDF file from a given URL, but I got following error message:
> {code:java}
> Exception in thread "main" java.io.IOException: Error: End-of-File, expected 
> lineException in thread "main" java.io.IOException: Error: End-of-File, 
> expected line at 
> org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1124) at 
> org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2595) at 
> org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2574) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122) at 
> ScreenshotFromPdf.Pdf2Image(ScreenshotFromPdf.java:19) at 
> ScreenshotFromPdf.main(ScreenshotFromPdf.java:33){code}
> Example pdf file: [http://aplaidshirt.epizy.com/samplePDF.pdf]
> Code:
> {code:java}
> public class ScreenshotFromPdf {
>  public static void Pdf2Image(String html) throws IOException, 
> InterruptedException {
>  Thread.sleep(5000);
>  URL url=new URL(html);
>  HttpURLConnection connection=(HttpURLConnection)url.openConnection();
>  InputStream is=connection.getInputStream();
>  PDDocument document = PDDocument.load(is);
>  PDFRenderer pdfRenderer = new PDFRenderer(document);
>  for (int page = 0; page < document.getNumberOfPages(); ++page) {
>  BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
>  File outputFile = new File("C:\\_privat\\pdftest\\" + page + "image.jpg");
>  System.out.println(outputFile.toString());
>  ImageIO.write(bim, "jpg", outputFile);
>  }
>  document.close();
>  }
>  public static void main(String[] args) throws IOException, 
> InterruptedException {
>  String url = "http://aplaidshirt.epizy.com/samplePDF.pdf;;
>  ScreenshotFromPdf.Pdf2Image(url);
>  }
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

2020-01-08 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010876#comment-17010876
 ] 

Maruan Sahyoun commented on PDFBOX-4723:


thx [~mkl] - I'm aware of this. But without sorting that out the whole effort 
is needless. Currently I have only 3 failing tests all related to merging 
AcroForms. Hope to find the cause soon.

> Add equals() and hashCode() to PDAnnotation and COS objects
> ---
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
>  Issue Type: Sub-task
>  Components: PDModel
>Affects Versions: 2.0.18
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4549) No Unicode mapping

2020-01-08 Thread Jorge Spinsanti (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010851#comment-17010851
 ] 

Jorge Spinsanti commented on PDFBOX-4549:
-

[~tilman] about the comment of [~tallison]
{quote}These are good points Michael Klink. See e.g.: 
[http://www.vintasoft.com/forums/viewtopic.php?t=2320] for willful/intentional 
obfuscation of test.
{quote}
Can you predict the obfuscation without text extraction? If yes, [~tallison] 
could use it to throw on Tika an exception such as `PDFProtectedException` or 
similar?

> No Unicode mapping
> --
>
> Key: PDFBOX-4549
> URL: https://issues.apache.org/jira/browse/PDFBOX-4549
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.15
>Reporter: Sergey Makarov
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.16, 3.0.0 PDFBox
>
> Attachments: XO_Thames.zip, our_star_wars.pdf
>
>
> Hello, if i try get text from pdf (attached), i will result empty out and 
> many warns. Font attached also.
>  Acrobat reader will open succeed, I can select, copy text and save as text
> my code:
> {code:java}
> private static void parseOne(String path) throws IOException {
> String pdfFileInText;
> PDFTextStripper tStripper;
> File file = new File(path);
> tStripper = new PDFTextStripper();
> MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 
> 5).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
> PDDocument document = PDDocument.load(file, memUsageSetting);
> if (!document.isEncrypted()) {
> pdfFileInText = tStripper.getText(document);
> System.out.print(pdfFileInText);
> }
> document.close();
> }{code}
> Error:
> {code:java}
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

2020-01-08 Thread Michael Klink (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010790#comment-17010790
 ] 

Michael Klink commented on PDFBOX-4723:
---

Beware, though, if you push this further and further, you'll eventually have to 
deal with circular references during hash calculation, and then the fun really 
starts... ;)

> Add equals() and hashCode() to PDAnnotation and COS objects
> ---
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
>  Issue Type: Sub-task
>  Components: PDModel
>Affects Versions: 2.0.18
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4549) No Unicode mapping

2020-01-08 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010747#comment-17010747
 ] 

Tim Allison commented on PDFBOX-4549:
-

And then there's this gem on content masking attacks: 
[https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/markwood]
 .  Many thanks to Peter Wyatt for bringing Markwood et al's work to my 
attention.

> No Unicode mapping
> --
>
> Key: PDFBOX-4549
> URL: https://issues.apache.org/jira/browse/PDFBOX-4549
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.15
>Reporter: Sergey Makarov
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.16, 3.0.0 PDFBox
>
> Attachments: XO_Thames.zip, our_star_wars.pdf
>
>
> Hello, if i try get text from pdf (attached), i will result empty out and 
> many warns. Font attached also.
>  Acrobat reader will open succeed, I can select, copy text and save as text
> my code:
> {code:java}
> private static void parseOne(String path) throws IOException {
> String pdfFileInText;
> PDFTextStripper tStripper;
> File file = new File(path);
> tStripper = new PDFTextStripper();
> MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 
> 5).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
> PDDocument document = PDDocument.load(file, memUsageSetting);
> if (!document.isEncrypted()) {
> pdfFileInText = tStripper.getText(document);
> System.out.print(pdfFileInText);
> }
> document.close();
> }{code}
> Error:
> {code:java}
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4549) No Unicode mapping

2020-01-08 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010741#comment-17010741
 ] 

Tim Allison commented on PDFBOX-4549:
-

These are good points [~mkl].  See e.g.: 
[http://www.vintasoft.com/forums/viewtopic.php?t=2320] for willful/intentional 
obfuscation of test.

Note that Google is running OCR on at least some PDFs.  See slides 50-51: 
[https://github.com/tballison/share/blob/master/slides/activate19/Activate2019_tika_tallison_20190911.pptx]

And even OCR can be gamed: [https://arxiv.org/abs/1802.05385]

 

:(

> No Unicode mapping
> --
>
> Key: PDFBOX-4549
> URL: https://issues.apache.org/jira/browse/PDFBOX-4549
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.15
>Reporter: Sergey Makarov
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.16, 3.0.0 PDFBox
>
> Attachments: XO_Thames.zip, our_star_wars.pdf
>
>
> Hello, if i try get text from pdf (attached), i will result empty out and 
> many warns. Font attached also.
>  Acrobat reader will open succeed, I can select, copy text and save as text
> my code:
> {code:java}
> private static void parseOne(String path) throws IOException {
> String pdfFileInText;
> PDFTextStripper tStripper;
> File file = new File(path);
> tStripper = new PDFTextStripper();
> MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 
> 5).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
> PDDocument document = PDDocument.load(file, memUsageSetting);
> if (!document.isEncrypted()) {
> pdfFileInText = tStripper.getText(document);
> System.out.print(pdfFileInText);
> }
> document.close();
> }{code}
> Error:
> {code:java}
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont 
> WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

2020-01-08 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010722#comment-17010722
 ] 

Maruan Sahyoun edited comment on PDFBOX-4723 at 1/8/20 2:48 PM:


Quick update. I added the methods to SmallMap and they work fine. But we are 
now getting some failing tests with StackOverFlow errors. One set of errors was 
caused by using {{equals}} instead of {{==}}  for checking identity in 
{{PDFMergerUtility}} which obviously no longer works. As soon as I found the 
others (probably same type of cause) I'll commit for trunk first.
 
This is the diff
{noformat}
-if (destValue != null && destValue.equals(entry.getValue()))
+if (destValue != null && destValue == entry.getValue())
 {
 // already exists, but identical
{noformat}

As can be seen {{equals}} is wrong in that case and only worked because of the 
missing {{equals}} implementation.


was (Author: msahyoun):
Quick update. I added the methods to SmallMap and they work fine. But we are 
now getting some failing tests with StackOverFlow errors. One set of errors was 
caused by using {{equals}} instead of {{==}}  for checking identity in 
{{PDFMergerUtility}} which obviously no longer works. As soon as I found the 
others (probably same type of cause) I'll commit for trunk first.
 

> Add equals() and hashCode() to PDAnnotation and COS objects
> ---
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
>  Issue Type: Sub-task
>  Components: PDModel
>Affects Versions: 2.0.18
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

2020-01-08 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010722#comment-17010722
 ] 

Maruan Sahyoun commented on PDFBOX-4723:


Quick update. I added the methods to SmallMap and they work fine. But we are 
now getting some failing tests with StackOverFlow errors. One set of errors was 
caused by using {{equals}} instead of {{==}}  for checking identity in 
{{PDFMergerUtility}} which obviously no longer works. As soon as I found the 
others (probably same type of cause) I'll commit for trunk first.
 

> Add equals() and hashCode() to PDAnnotation and COS objects
> ---
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
>  Issue Type: Sub-task
>  Components: PDModel
>Affects Versions: 2.0.18
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

2020-01-08 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010622#comment-17010622
 ] 

Maruan Sahyoun edited comment on PDFBOX-4723 at 1/8/20 12:08 PM:
-

[~mkl] thanks for pointing it out. Now this gets more and more difficult :-)

A "standard" java Map like HashMap is equal if it has the same content. For 
SmallMap that's not true. IMHO this is against the users expectation. So I will 
look into implementaing equals and hashCode for SmallMap too.


was (Author: msahyoun):
[~mkl] thanks for pointing it out. Now this gets more and more difficult :-)

> Add equals() and hashCode() to PDAnnotation and COS objects
> ---
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
>  Issue Type: Sub-task
>  Components: PDModel
>Affects Versions: 2.0.18
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

2020-01-08 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010622#comment-17010622
 ] 

Maruan Sahyoun commented on PDFBOX-4723:


[~mkl] thanks for pointing it out. Now this gets more and more difficult :-)

> Add equals() and hashCode() to PDAnnotation and COS objects
> ---
>
> Key: PDFBOX-4723
> URL: https://issues.apache.org/jira/browse/PDFBOX-4723
> Project: PDFBox
>  Issue Type: Sub-task
>  Components: PDModel
>Affects Versions: 2.0.18
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to 
> detect if entries are in fact duplicates of others. This currently fails as 
> even though one might add the same instance of an annotation object multiple 
> times to setAnnotations getting the annotations will have individual 
> instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality 
> where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org