Re: Unicode codepoint conversions
Get the code from https://gist.github.com/cdokolas/8845724f8f4c0335dadfbc6f0c6afe0b There is also a resulting PDF having a "ToUnicode" object that has the "different" codepoints, but I don't know how to send it to you. Note that the font I'm using is Noto Sans CJK ("NotoSansCJKsc-Regular"), loaded from a resource. Thanks in advance, Constantine -- There is a computer disease that anybody who works with computers knows about. It's a very serious disease and it interferes completely with the work. The trouble with computers is that you 'play' with them! - Richard P. Feynman On Wed, Nov 18, 2020 at 3:58 PM sahy...@fileaffairs.de < sahy...@fileaffairs.de> wrote: > > Am Mittwoch, den 18.11.2020, 13:58 +0200 schrieb Constantine Dokolas: > > I noticed that writing some codepoints to a PDF and then reading back > > the > > text from the generated PDF (via PDFTextStripper), I see some > > conversions > > happening. For example, the simple hyphen character (0x2D, "HYPHEN- > > MINUS") > > gets converted to a non-breaking hyphen (0x2011, "NON-BREAKING > > HYPHEN"). > > > > Since I'm writing unit tests to verify that everything gets written > > correctly in the PDF from my end (PDF generation), I need to know > > why, when > > and how these conversions take place (I first noticed them while > > writing > > some CJK codepoints). Any suggestions/pointers? > > > > Could you share a code snippet how you are writing/retrieving the data. > > BR > Maruan > > > Constantine > > -- > > There is a computer disease that anybody who works with computers > > knows > > about. It's a very serious disease and it interferes completely with > > the > > work. The trouble with computers is that you 'play' with them! > > - Richard P. Feynman > > > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > >
Re: Unicode codepoint conversions
Am Mittwoch, den 18.11.2020, 13:58 +0200 schrieb Constantine Dokolas: > I noticed that writing some codepoints to a PDF and then reading back > the > text from the generated PDF (via PDFTextStripper), I see some > conversions > happening. For example, the simple hyphen character (0x2D, "HYPHEN- > MINUS") > gets converted to a non-breaking hyphen (0x2011, "NON-BREAKING > HYPHEN"). > > Since I'm writing unit tests to verify that everything gets written > correctly in the PDF from my end (PDF generation), I need to know > why, when > and how these conversions take place (I first noticed them while > writing > some CJK codepoints). Any suggestions/pointers? > Could you share a code snippet how you are writing/retrieving the data. BR Maruan > Constantine > -- > There is a computer disease that anybody who works with computers > knows > about. It's a very serious disease and it interferes completely with > the > work. The trouble with computers is that you 'play' with them! > - Richard P. Feynman - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Unicode codepoint conversions
I noticed that writing some codepoints to a PDF and then reading back the text from the generated PDF (via PDFTextStripper), I see some conversions happening. For example, the simple hyphen character (0x2D, "HYPHEN-MINUS") gets converted to a non-breaking hyphen (0x2011, "NON-BREAKING HYPHEN"). Since I'm writing unit tests to verify that everything gets written correctly in the PDF from my end (PDF generation), I need to know why, when and how these conversions take place (I first noticed them while writing some CJK codepoints). Any suggestions/pointers? Constantine -- There is a computer disease that anybody who works with computers knows about. It's a very serious disease and it interferes completely with the work. The trouble with computers is that you 'play' with them! - Richard P. Feynman