On 14.02.2025 12:18, Stefan wrote:
Do you want to display it in your own application?
Yes. I want to display the tag tree with some additional information.

And without the information on the source variant I can't do it.

I also see my original reply was wrong, we do have the forceHexForm
flag. However from what I see it is always false. We could set it to
true here:

public COSString(byte[] bytes)
{
this(bytes, false);
}
So with the current implementation all COSString elements created by the parser 
are, when a document is saved, written in the string (...) notation and thus 
not in the original one - which would be a bug?


No... I just found something that I thought existed, this is in COSWriter:


    private static void writeString(byte[] bytes, boolean forceHex, OutputStream output)
            throws IOException
    {
        // check for non-ASCII characters
        boolean isASCII = true;
        if (!forceHex)
        {
            for (byte b : bytes)
            {
                // if the byte is negative then it is an eight bit byte and is outside the ASCII range
                if (b < 0)
                {
                    isASCII = false;
                    break;
                }
                // PDFBOX-3107 EOL markers within a string are troublesome
                if (b == 0x0d || b == 0x0a)
                {
                    isASCII = false;
                    break;
                }
            }
        }

        if (isASCII && !forceHex)
        {
            // write ASCII string
            output.write('(');
            for (byte b : bytes)
            {
                switch (b)
                {
                    case '(':
                    case ')':
                    case '\\':
                        output.write('\\');
                        output.write(b);
                        break;
                    default:
                        output.write(b);
                        break;
                }
            }
            output.write(')');
        }
        else
        {
            // write hex string
            output.write('<');
            Hex.writeHexBytes(bytes, output);
            output.write('>');
        }
    }

So non ASCII strings are written Hex. You could use a similar code to decide how to display your strings.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to