Hi,
> Am 18.07.2016 um 14:15 schrieb Adam Retter <[email protected]>:
>
> Using pdf-box-2.0.2:
>
> I am trying to set dc:publisher to "Çâmára Münícìpål de Matelâñdia" in
> the metadata of my PDF however my diacritical characters seem to get
> mangled when I try and read the PDF back.
>
> My writing code looks like:
>
> PDDocument doc = ...
> PDDocumentCatalog catalog = ...
>
> PDMetadata metadataStream = Optional.ofNullable(catalog.getMetadata())
> .orElseGet(() -> new PDMetadata(doc));
> XMPMetadata xmpMetadata = null;
> try(COSInputStream is = metadataStream.createInputStream()) {
> xmpMetadata = new DomXmpParser().parse(is);
> } catch(XmpParsingException e) {
> LOG.warn(e);
> xmpMetadata = XMPMetadata.createXMPMetadata();
> }
> DublinCoreSchema dcMetadata = xmpMetadata.createAndAddDublinCoreSchema();
> dcMetadata.addPublisher("Çâmára Münícìpål de Matelâñdia");
> catalog.setMetadata(xmpMetadata);
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> XmpSerializer serializer = new XmpSerializer();
> serializer.serialize(xmpMetadata, baos, false);
> metadataStream.importXMPMetadata(baos.toByteArray());
>
>
> My reading code looks like:
>
> PDDocment doc = PDDocument.load(is);
> PDDocumentCatalog catalog = doc.getDocumentCatalog()
> PDMetadata metadata = catalog.getMetadata()
> try(InputStream is = metadata.createInputStream()) {
> Files.copy(is, Paths.get("/tmp/metadata.xml"));
> }
>
>
> However in the output XML I am seeing this:
>
> <dc:publisher>
> <rdf:Bag>
> <rdf:li>??m?ra M?n?c?p?l de Matel??dia</rdf:li>
> </rdf:Bag>
> </dc:publisher>
>
>
I've tested various ways of saving the file, yours, serializing to
FileOutputStream … and all work with when viewing the content in a browser ot a
text editor.
<dc:publisher>
<rdf:Bag>
<rdf:li>Çâmára Münícìpål de Matelâñdia</rdf:li>
</rdf:Bag>
</dc:publisher>
Where do you see that string?
BR
Maruan
> So I guess something is up with the character encoding somewhere? Is
> this something I am doing incorrectly, perhaps I need to specify UTF-8
> somewhere (my character set)? or is this a bug in pdf-box?
>
> Cheers Adam.
>
>
>
>
>
> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]