On 14.02.2025 12:18, Stefan wrote:
Do you want to display it in your own application?
Yes. I want to display the tag tree with some additional information.
And without the information on the source variant I can't do it.
I also see my original reply was wrong, we do have the forceHexForm
flag. However from what I see it is always false. We could set it to
true here:
public COSString(byte[] bytes)
{
this(bytes, false);
}
So with the current implementation all COSString elements created by the parser
are, when a document is saved, written in the string (...) notation and thus
not in the original one - which would be a bug?
No... I just found something that I thought existed, this is in COSWriter:
private static void writeString(byte[] bytes, boolean forceHex,
OutputStream output)
throws IOException
{
// check for non-ASCII characters
boolean isASCII = true;
if (!forceHex)
{
for (byte b : bytes)
{
// if the byte is negative then it is an eight bit byte
and is outside the ASCII range
if (b < 0)
{
isASCII = false;
break;
}
// PDFBOX-3107 EOL markers within a string are troublesome
if (b == 0x0d || b == 0x0a)
{
isASCII = false;
break;
}
}
}
if (isASCII && !forceHex)
{
// write ASCII string
output.write('(');
for (byte b : bytes)
{
switch (b)
{
case '(':
case ')':
case '\\':
output.write('\\');
output.write(b);
break;
default:
output.write(b);
break;
}
}
output.write(')');
}
else
{
// write hex string
output.write('<');
Hex.writeHexBytes(bytes, output);
output.write('>');
}
}
So non ASCII strings are written Hex. You could use a similar code to
decide how to display your strings.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org