Dear Tilman, can you give me the java code you process it successful? very thanks.
-- 发自我的网易邮箱手机智能版 在 2015-07-28 21:16:16,"牛小伟" <[email protected]> 写道: >Dear Tilman, >I find the problem before. >Now got this error,Please help,thanks: >七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init> >警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root >0x80000002. Windows RegCreateKeyEx(...) returned error code 5. >七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider >loadCache >警告: New fonts found, font cache will be re-built >七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider ><init> >警告: Building font cache, this may take a while >七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider >saveCache >警告: Finished building font cache, found 543 fonts >七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 <init> >警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5 >java.io.IOException: Error: Could not find referenced cmap stream >UniJIS-UCS2-HW-H >at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413) >at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85) >at >org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:54) >at >org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:161) >at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109) >at >org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83) >at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121) >at >org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:50) >at >org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:794) >at >org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:460) >at >org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437) >at >org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148) >at >org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117) >at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367) >at >org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303) >at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248) >at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209) >at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16) >at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source) > > > > > > > > > > >在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道: >>Dear Tilman, >> Thanks.Then do you know when will the 2.0 version be released? >> >>Best regards >>Niu Xiaowei >> >>-- >>发自我的网易邮箱手机智能版 >> >> >>在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道: >>>Dear Tilman, >>>Thanks for your support.The original file is in the company. >>>I can't get it. But I made a simple one using Itext. >>>They are in the same encoding.The pdfBox can't process it either. >>>Please check the attachment. >>> >>> >>>Thanks, >>>Best Regards, >>>Niu X >>> >>> >>> >>> >>> >>> >>> >>> >>>At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote: >>>>Dear team: >>>> We are using your product pdfbox 1.6 to do text extraction. >>>>But when we are processing the encoding(UniJIS-UCS2-HW-H), >>>>it appears unreadable code like >>>>this(????????????????????????3?????????????). >>>>We have tried some other ways to process it. But they don't work. >>>>We also have some doc with the encoding(GBK-EUC-H),the pdfbox >>>>can work perfectly. I also tried the pdfbox 1.8, it also didn't work. >>>>I checked the charset of the pdfbox. It contains both of the encoding. >>>>I don't know why one is working, another is not working. >>>>Hope your support for this .Very thanks. >>>> >>>> >>>>Best Regard. >>>> >>>> >>>>the docsnapshot of the encoding: >>>> >>>> >>> >>> >>> >>> >>> >>> > > >

