Hi, Bernd Engelhardt schrieb: > Hi, > I am trying to extract some text content from a PDF file. If I use a PDF file > with western content everything works perfect. If I try to do the same with a > PDF file, which contains some asian characters, I get an exception (see > below). As far as I can see is the cmap "UniJIS-UCS2-H" in the > "Resources/cmap" folder. Do I have to load the cmap or is this map > automatically loaded? Does PdfBox supports asian languages? What have I to do > to support such languages? Any hint is welcome. Thanks I'm afraid there are still some issues concerning asian mappings. See [1] and [2] for further details.
BR Andreas Lehmkühler [1] https://issues.apache.org/jira/browse/PDFBOX-509 [2] https://issues.apache.org/jira/browse/PDFBOX-420