[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932100#comment-16932100 ] Tilman Hausherr commented on PDFBOX-4648: - No, you would have to use OCR. The problem occurs when creating the PDF. One could recreate the ToUnicode table but it would take hours and probably work only for that file. https://stackoverflow.com/questions/39485920/how-to-add-unicode-in-truetype0font-on-pdfbox-2-0-0 > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png, > image-2019-09-18-05-55-26-771.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932083#comment-16932083 ] wanling commented on PDFBOX-4648: - Do you know any way to solve this problem? > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png, > image-2019-09-18-05-55-26-771.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932024#comment-16932024 ] Tilman Hausherr commented on PDFBOX-4648: - "511tm" is missing both in Adobe and in PDFBox. If you look at the font "F5" in PDFDebugger you'll see that the column "Unicode character" is missing. !image-2019-09-18-05-55-26-771.png! > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png, > image-2019-09-18-05-55-26-771.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931969#comment-16931969 ] wanling commented on PDFBOX-4648: - sorry ,I got "SLIM CUT". this is a typo when writing here.I am concerned with '511tm',so I slightly overlooked it. > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931619#comment-16931619 ] Tilman Hausherr commented on PDFBOX-4648: - But you wrote that you got "SLM CUT" (instead of "SLIM CUT"). Or was this a typo when writing here? Do you get "SLM CUT" or "SLIM CUT" with text extraction from 2.0.16? > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931266#comment-16931266 ] wanling commented on PDFBOX-4648: - My computer display is the same as yours. So far, no solution has been found. > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928282#comment-16928282 ] Tilman Hausherr commented on PDFBOX-4648: - The squares are Adobe only, so we can't do anything. The missing "511 TM" is also missing on Adobe text extraction. This is because the font has no ToUnicode stream. "SLIM CUT" appears fine here. Even if I use 2.0.4. Please try again with 2.0.16, make sure you have a current java version on your computer, then download and run PDFDebugger and look for the font F4 in your file. Here's how it looks on my system: !image-2019-09-12-08-46-39-391.png! > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-46-39-391.png > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928263#comment-16928263 ] wanling commented on PDFBOX-4648: - I can export it like yours.but the part words is missing. 511 TM SLIM CUT is SLM CUT .511tm is missing .it is usefull. > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928260#comment-16928260 ] wanling commented on PDFBOX-4648: - Thanks for your answ > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4648) OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not implemented in PDFBox and will be ignored
[ https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927838#comment-16927838 ] Tilman Hausherr commented on PDFBOX-4648: - I assume this is a follow-up of PDFBOX-4647. You could have reopened the issue. Anyway, I have attached two text extractions, one by PDFBox and one by Adobe. What are you missing? > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > - > > Key: PDFBOX-4648 > URL: https://issues.apache.org/jira/browse/PDFBOX-4648 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.4 >Reporter: wanling >Priority: Major > Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt, > 5e214f828f164322a6600f183191dda5-PDFBox.txt, > 5e214f828f164322a6600f183191dda5.pdf > > > No PostScript name information is provided for the font Arial-BoldMT > OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not > implemented in PDFBox and will be ignored > No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold > > Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see > it completely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org