Re: [DISCUSS] JBIG2-integration with JIRA or github
> Am 01.11.2017 um 19:01 schrieb Andreas Lehmkuehler: > > Am 01.11.2017 um 13:59 schrieb Maruan Sahyoun: >> Hi, >>> Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler : >>> >>> Hi all, >>> >>> the git-repository for the JBIG2 is online for a couple of days and we >>> haven't decided yet what kind of platform we want to integrate with the >>> repository. >>> >>> PDFBox uses svn and integrates with JIRA, so that every checkin is >>> automatically linked to a JIRA-ticket (as long one adds the ticket number >>> to the commit comment). >> the same is possible with git & svn. E.g. the documentation is using git. As >> long as you add the JIRA ticket number to the commit message it will link to >> JIRA. >> See >> https://issues.apache.org/jira/browse/PDFBOX-3330?focusedCommentId=16200067=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200067 >> as an example. > That integration isn't active yet. We need to ask infra to do so. > >>> >>> The question is, how should we proceed with the JBIG2 repo? >>> Should we use JIRA as well to track bugs, improvements and any other kind >>> of requests? >> +1 >>> Or should we use github and PRs to keep track of all changes? >>> >> we can use PRs to if the include the ticket number. >> Apache Camel is using git since quite some time. See >> https://github.com/apache/camel/blob/master/CONTRIBUTING.md#pull-request-at-github >> how to handle PRs linked to JIRA. >>> I'm not really familiar with git (I know a handful of commands to update >>> our website), but github seems the natural choice for me. >>> >> there is an even tighter integration with github now called gitbox. AFAIK >> Camle is moving to it as are some others >> https://issues.apache.org/jira/browse/INFRA-15288 > Hmm, I've read about that but I don't understand the difference. Do you > know/can you explain which advantages/additional functions gitbox? Do we need > them too? AFAIK the main benefit is how PRs can be merged. Current approach: http://mahout.apache.org/developers/github.html Approach with gitbox: http://opennlp.apache.org/using-git.html So if we expect an active contribution through GitHub gitbox will make it easier, BR Maruan > >> BR >> Maruan >>> WDYT? >>> >>> Andreas >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org >>> For additional commands, e-mail: dev-h...@pdfbox.apache.org >>> >> - >> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org >> For additional commands, e-mail: dev-h...@pdfbox.apache.org > > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [DISCUSS] JBIG2-integration with JIRA or github
Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler: Hi all, the git-repository for the JBIG2 is online for a couple of days and we haven't decided yet what kind of platform we want to integrate with the repository. PDFBox uses svn and integrates with JIRA, so that every checkin is automatically linked to a JIRA-ticket (as long one adds the ticket number to the commit comment). The question is, how should we proceed with the JBIG2 repo? Should we use JIRA as well to track bugs, improvements and any other kind of requests? Or should we use github and PRs to keep track of all changes? I'm not really familiar with git (I know a handful of commands to update our website), but github seems the natural choice for me. WDYT? I prefer JIRA but I'd like to have the new team members to be as comfortable as possible so lets hear from them. So I'm neutral on this one. Tilman - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.
[ https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234527#comment-16234527 ] Tilman Hausherr commented on PDFBOX-3970: - IIRC text extraction isn't done on annotations. ??? > x,y co-ordinates of the text inside the cell are not getting correctly. > --- > > Key: PDFBOX-3970 > URL: https://issues.apache.org/jira/browse/PDFBOX-3970 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.7 > Environment: Operating system: Windows 7 (64 bit). >Reporter: Navnath Kumbhar >Priority: Major > Labels: how-to > Attachments: formula-marked-34.png, > paragraphNextToTable-marked-1.png, paragraphNextToTable.pdf, > simpleAnnotation.pdf > > > Hello Support Team, > I am working on a project which parses a whole PDF document and stores the > extracted text in some .txt file which can be read by other product. > My issue is regarding extracting the text inside the cell of a table: > *x,y co-ordinates of the text inside the cell are not getting correctly.* > Y value of the last text line in the cell is getting larger than cell's max-Y > value. > I have attached the test file with this bug. > As you can see in the test document, there is one cell along-with text in it > and a text paragraph next to that cell. > x-y coordinates that I get from pdfbox for all the paths (two vertical and > two horizontal lines) of the cell are: > (in x1,y1,x2,y2 format) > Horizontal line 1: [100,88,220,88] > Horizontal line 2: [100,120,220,120] > Vertical line 1 : [100,88,100,120] > Vertical line 2: [220,88,220,120] > (Y values of the above paths are final values by subtracting the actual value > given by pdfbox from height of the page as I see that for paths, y-values are > processed from bottom to up) > And bounding box of the last line in that cell is : [102,114,59,7] and hence > max-Y of that line becomes 121 (min-Y + height) > > So, if we consider max-Y value of that cell (i.e. 120) and that of last line > in that cell (i.e. 121), clearly, that line goes out of that cell. > What can be the possible reason for this? > Thank you in advance! > Regards, > Navnath Kumbhar -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3986) Bounding box of mathematical symbols are not proper
[ https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234517#comment-16234517 ] Tilman Hausherr commented on PDFBOX-3986: - [~lehmi] I disagree here. Yes the first two numbers are negative, but the sum of y and height is usually positive, but here it isn't. [~navnath] What I meant is this: consider the glyphs "a" and "g". "a" is printed at the baseline. "g" has a part that is above and a part that is below the baseline. And the summation symbol is fully below the baseline, which IMHO is unusual. > Bounding box of mathematical symbols are not proper > --- > > Key: PDFBOX-3986 > URL: https://issues.apache.org/jira/browse/PDFBOX-3986 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.7 > Environment: Windows 7 (64 bit) >Reporter: Navnath Kumbhar >Priority: Major > Attachments: PDFBOX-3986-reduced.pdf, formula-marked-34.png, > formula-marked-37.png, formula.pdf > > > Hello Support Team, > I am working on a task where I have to extract formulas from PDF document and > convert them into images. > But when I extract them using PDFBox, some of the symbols like *Summation*, > *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line. > I checked the output of DrawPrintTextLocations example with that particular > PDF document and result does not look normal. > Red boxes are not aligned properly in the output as you will see in the > attachment files. > I am, herewith, attaching the output of two pages and PDF document itself. > *Please refer page no. 34 or 37 for this issue.* > Thank you in advance! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3986) Bounding box of mathematical symbols are not proper
[ https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234493#comment-16234493 ] Andreas Lehmkühler commented on PDFBOX-3986: Those numbers are provided by the font itself. They are negative by design > Bounding box of mathematical symbols are not proper > --- > > Key: PDFBOX-3986 > URL: https://issues.apache.org/jira/browse/PDFBOX-3986 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.7 > Environment: Windows 7 (64 bit) >Reporter: Navnath Kumbhar >Priority: Major > Attachments: PDFBOX-3986-reduced.pdf, formula-marked-34.png, > formula-marked-37.png, formula.pdf > > > Hello Support Team, > I am working on a task where I have to extract formulas from PDF document and > convert them into images. > But when I extract them using PDFBox, some of the symbols like *Summation*, > *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line. > I checked the output of DrawPrintTextLocations example with that particular > PDF document and result does not look normal. > Red boxes are not aligned properly in the output as you will see in the > attachment files. > I am, herewith, attaching the output of two pages and PDF document itself. > *Please refer page no. 34 or 37 for this issue.* > Thank you in advance! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [DISCUSS] JBIG2-integration with JIRA or github
Am 01.11.2017 um 13:59 schrieb Maruan Sahyoun: Hi, Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler: Hi all, the git-repository for the JBIG2 is online for a couple of days and we haven't decided yet what kind of platform we want to integrate with the repository. PDFBox uses svn and integrates with JIRA, so that every checkin is automatically linked to a JIRA-ticket (as long one adds the ticket number to the commit comment). the same is possible with git & svn. E.g. the documentation is using git. As long as you add the JIRA ticket number to the commit message it will link to JIRA. See https://issues.apache.org/jira/browse/PDFBOX-3330?focusedCommentId=16200067=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200067 as an example. That integration isn't active yet. We need to ask infra to do so. The question is, how should we proceed with the JBIG2 repo? Should we use JIRA as well to track bugs, improvements and any other kind of requests? +1 Or should we use github and PRs to keep track of all changes? we can use PRs to if the include the ticket number. Apache Camel is using git since quite some time. See https://github.com/apache/camel/blob/master/CONTRIBUTING.md#pull-request-at-github how to handle PRs linked to JIRA. I'm not really familiar with git (I know a handful of commands to update our website), but github seems the natural choice for me. there is an even tighter integration with github now called gitbox. AFAIK Camle is moving to it as are some others https://issues.apache.org/jira/browse/INFRA-15288 Hmm, I've read about that but I don't understand the difference. Do you know/can you explain which advantages/additional functions gitbox? Do we need them too? BR Maruan WDYT? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3986) Bounding box of mathematical symbols are not proper
[ https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234162#comment-16234162 ] Navnath Kumbhar commented on PDFBOX-3986: - What do you mean by *_Font itself insists to do that?_* > Bounding box of mathematical symbols are not proper > --- > > Key: PDFBOX-3986 > URL: https://issues.apache.org/jira/browse/PDFBOX-3986 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.7 > Environment: Windows 7 (64 bit) >Reporter: Navnath Kumbhar >Priority: Major > Attachments: PDFBOX-3986-reduced.pdf, formula-marked-34.png, > formula-marked-37.png, formula.pdf > > > Hello Support Team, > I am working on a task where I have to extract formulas from PDF document and > convert them into images. > But when I extract them using PDFBox, some of the symbols like *Summation*, > *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line. > I checked the output of DrawPrintTextLocations example with that particular > PDF document and result does not look normal. > Red boxes are not aligned properly in the output as you will see in the > attachment files. > I am, herewith, attaching the output of two pages and PDF document itself. > *Please refer page no. 34 or 37 for this issue.* > Thank you in advance! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.
[ https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navnath Kumbhar updated PDFBOX-3970: Attachment: simpleAnnotation.pdf Hello Tilman, Thank you for pointing out the right code snippet. I have done some changes in the LegacyPDFStreamEngine.java Below is my code change: {code:java} @Override protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException { // // legacy calculations which were previously in PDFStreamEngine // // DO NOT USE THIS CODE UNLESS YOU ARE WORKING WITH PDFTextStripper. // THIS CODE IS DELIBERATELY INCORRECT // PDGraphicsState state = getGraphicsState(); Matrix ctm = state.getCurrentTransformationMatrix(); float fontSize = state.getTextState().getFontSize(); float horizontalScaling = state.getTextState().getHorizontalScaling() / 100f; Matrix textMatrix = getTextMatrix(); Shape glyphShape = getActualGlyphBoundingBox(textRenderingMatrix, font, code); BoundingBox bbox = new BoundingBox((float)glyphShape.getBounds2D().getMinX(), (float)glyphShape.getBounds2D().getMinY(), (float)glyphShape.getBounds2D().getMaxX(), (float)glyphShape.getBounds2D().getMaxY()); if (bbox.getLowerLeftY() < Short.MIN_VALUE) { // PDFBOX-2158 and PDFBOX-3130 // files by Salmat eSolutions / ClibPDF Library bbox.setLowerLeftY(- (bbox.getLowerLeftY() + 65536)); } // 1/2 the bbox is used as the height todo: why? float glyphHeight = bbox.getHeight()/2; /*PDFontDescriptor fontDescriptor = font.getFontDescriptor(); if (fontDescriptor != null) { float capHeight = fontDescriptor.getCapHeight(); if (capHeight != 0 && (capHeight < glyphHeight || glyphHeight == 0)) { glyphHeight = capHeight; } }*/ // transformPoint from glyph space -> text space float height; if (font instanceof PDType3Font) { height = font.getFontMatrix().transformPoint(0, glyphHeight).y; } else { height = glyphHeight / 1000; } . . . } {code} And here is *getActualGlyphBoundingBox()* method. {code:java} private Shape getActualGlyphBoundingBox(Matrix textRenderingMatrix, PDFont font, int code) throws IOException { GeneralPath path = null; AffineTransform at = textRenderingMatrix.createAffineTransform(); at.concatenate(font.getFontMatrix().createAffineTransform()); if (font instanceof PDType3Font) { PDType3Font t3Font = (PDType3Font) font; PDType3CharProc charProc = t3Font.getCharProc(code); if (charProc != null) { PDRectangle glyphBBox = charProc.getGlyphBBox(); if (glyphBBox != null) { path = glyphBBox.toGeneralPath(); } } } else if (font instanceof PDVectorFont) { PDVectorFont vectorFont = (PDVectorFont) font; path = vectorFont.getPath(code); if (font instanceof PDTrueTypeFont) { PDTrueTypeFont ttFont = (PDTrueTypeFont) font; int unitsPerEm = ttFont.getTrueTypeFont().getHeader().getUnitsPerEm(); at.scale(1000d / unitsPerEm, 1000d / unitsPerEm); } if (font instanceof PDType0Font) { PDType0Font t0font = (PDType0Font) font; if (t0font.getDescendantFont() instanceof PDCIDFontType2) { int unitsPerEm = ((PDCIDFontType2) t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm(); at.scale(1000d / unitsPerEm, 1000d / unitsPerEm); } } } else if (font instanceof PDSimpleFont) { PDSimpleFont simpleFont = (PDSimpleFont) font; // these two lines do not always work, e.g. for the TT fonts in file 032431.pdf // which is why PDVectorFont is tried first. String name = simpleFont.getEncoding().getName(code); path = simpleFont.getPath(name); } else { // shouldn't happen, please open issue in JIRA System.out.println("Unknown font class: " + font.getClass()); } if (path == null) { return null; } //return at.createTransformedShape(path.getBounds2D()); return path.getBounds2D(); } {code} I am getting satisfactory results for text
Re: [DISCUSS] JBIG2-integration with JIRA or github
Hi, > Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler: > > Hi all, > > the git-repository for the JBIG2 is online for a couple of days and we > haven't decided yet what kind of platform we want to integrate with the > repository. > > PDFBox uses svn and integrates with JIRA, so that every checkin is > automatically linked to a JIRA-ticket (as long one adds the ticket number to > the commit comment). the same is possible with git & svn. E.g. the documentation is using git. As long as you add the JIRA ticket number to the commit message it will link to JIRA. See https://issues.apache.org/jira/browse/PDFBOX-3330?focusedCommentId=16200067=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200067 as an example. > > The question is, how should we proceed with the JBIG2 repo? > Should we use JIRA as well to track bugs, improvements and any other kind of > requests? +1 > Or should we use github and PRs to keep track of all changes? > we can use PRs to if the include the ticket number. Apache Camel is using git since quite some time. See https://github.com/apache/camel/blob/master/CONTRIBUTING.md#pull-request-at-github how to handle PRs linked to JIRA. > I'm not really familiar with git (I know a handful of commands to update our > website), but github seems the natural choice for me. > there is an even tighter integration with github now called gitbox. AFAIK Camle is moving to it as are some others https://issues.apache.org/jira/browse/INFRA-15288 BR Maruan > WDYT? > > Andreas > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
RE: Running tika-eval on the Rackspace vm
Sorry. Fixed. -Original Message- From: Tilman Hausherr [mailto:thaush...@t-online.de] Sent: Tuesday, October 31, 2017 6:08 PM To: dev@pdfbox.apache.org Subject: Re: Running tika-eval on the Rackspace vm Am 31.10.2017 um 20:53 schrieb Allison, Timothy B.: >> It's not possible to rename / remove the files / directories mentioned in >> part 1 due to not having the permissions. > Gah. Sorry. Tilman, I added you to "collab" and chgrp to collab on /work > /data2/docs /data3/batch_runs and /data4/batch_runs. But the directories themselves don't have "w" rights for group so I can't profit from my membership... (unless I missed something, I haven't done much *nix since the 90ies) For example I can't rename /work/batch-apps/tika_working/logs to /work/batch-apps/tika_working/___logs . Tilman > >> The directory is named batch-apps, not batch_apps. > Fixed. Thank you. > >> Re the "A" version - is this the "good" version, so I could simply download >> tika-app and put it there? Or just build tika with a specific PDFBox >> version? > If the current version of tika-app has the right version of PDFBox for your > "before" examples, then y, you can just download tika-app.jar. We release > less frequently than PDFBox, so it's possible that you'll want to build from > scratch with the most recent previous release of PDFBox. > > In my mind, A is the "before/baseline" version and B is the > SNAPSHOT/RC version. So, hopefully, B is the "good" one. > > Let me know what other problems you encounter. > > Cheers, > > Tim > > > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For > additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[DISCUSS] JBIG2-integration with JIRA or github
Hi all, the git-repository for the JBIG2 is online for a couple of days and we haven't decided yet what kind of platform we want to integrate with the repository. PDFBox uses svn and integrates with JIRA, so that every checkin is automatically linked to a JIRA-ticket (as long one adds the ticket number to the commit comment). The question is, how should we proceed with the JBIG2 repo? Should we use JIRA as well to track bugs, improvements and any other kind of requests? Or should we use github and PRs to keep track of all changes? I'm not really familiar with git (I know a handful of commands to update our website), but github seems the natural choice for me. WDYT? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.8
Hi, +1 Thanks, Timo Am 30.10.2017 um 19:47 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.8 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.8/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.8/ The SHA1 checksum of the archive is 5c0607144dde1b7af3dd428cafbd2c9c29617ab3. Please vote on releasing this package as Apache PDFBox 2.0.8. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.8 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3985) IOException thrown from org.apache.fontbox.ttf.CMAPEncodingEntry.processSubtype14
[ https://issues.apache.org/jira/browse/PDFBOX-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233711#comment-16233711 ] Tilman Hausherr commented on PDFBOX-3985: - The latest one brings just a log message and no exception. > IOException thrown from > org.apache.fontbox.ttf.CMAPEncodingEntry.processSubtype14 > - > > Key: PDFBOX-3985 > URL: https://issues.apache.org/jira/browse/PDFBOX-3985 > Project: PDFBox > Issue Type: Improvement > Components: FontBox >Affects Versions: 2.0.7 >Reporter: Tomonori Soejima >Priority: Minor > > I ran into this issue while processing a pdf file through elasticsearch and > it turns out that the error was because [the method is not implemented| > https://apache.googlesource.com/pdfbox/+/refs/heads/trunk/fontbox/src/main/java/org/apache/fontbox/ttf/CmapSubtable.java#327] > > Below is an a snippet of stack trace I ran into. > Is there any plan to implementing this method? > An error occured when reading table cmap > java.io.IOException: CMap subtype 14 not yet implemented > at > org.apache.fontbox.ttf.CMAPEncodingEntry.processSubtype14(CMAPEncodingEntry.java:304) > at > org.apache.fontbox.ttf.CMAPEncodingEntry.initSubtable(CMAPEncodingEntry.java:114) > at org.apache.fontbox.ttf.CMAPTable.initData(CMAPTable.java:100) > at > org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280) > at > org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128) > at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80) > at > org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109) > at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) > at > org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84) > at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) > at > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getTTFFont(PDTrueTypeFont.java:632) > at > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:673) > at > org.apache.pdfbox.pdmodel.font.PDSimpleFont.getFontWidth(PDSimpleFont.java:231) > at > org.apache.pdfbox.pdmodel.font.PDSimpleFont.getSpaceWidth(PDSimpleFont.java:533) > at > org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355) > at > org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:458) > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:383) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:342) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:148) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:148) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.Tika.parseToString(Tika.java:537) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3987) Apache PDFBox {2.0.6,2.0.7} java.lang.NoSuchMethodError: org.apache.fontbox.ttf.TrueTypeFont.getOriginalDataSize()
[ https://issues.apache.org/jira/browse/PDFBOX-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233706#comment-16233706 ] Tilman Hausherr commented on PDFBOX-3987: - It is definitively in 2.0.7 but not in 2.0.6. Please check your class path, it should have only one version. > Apache PDFBox {2.0.6,2.0.7} java.lang.NoSuchMethodError: > org.apache.fontbox.ttf.TrueTypeFont.getOriginalDataSize() > -- > > Key: PDFBOX-3987 > URL: https://issues.apache.org/jira/browse/PDFBOX-3987 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.6, 2.0.7 > Environment: Oracle Linux Server 6.8, Sun/Oracle Java SE JDK > 1.8.0_141, jetty-8.1.12.v20130726, ActiveWeb-1.15 >Reporter: Sergei Haramundanis >Priority: Major > > The following exception occurs during PDF generation using Apache PDFBox. It > appears to be caused because Apache PDFBox {2.0.7,2.0.6} is bundled with and > uses dependent library Apache FontBox org.apache.pdfbox:fontbox:bundle:2.0.7, > of which class org.apache.fontbox.ttf.TrueTypeFont does not include the > implementation for getOriginalDataSize(), although it is documented in the > API docs. > java.lang.NoSuchMethodError: > org.apache.fontbox.ttf.TrueTypeFont.getOriginalDataSize()J > at > org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.buildFontFile2(TrueTypeEmbedder.java:117) > at > org.apache.pdfbox.pdmodel.font.PDCIDFontType2Embedder.buildSubset(PDCIDFontType2Embedder.java:106) > at > org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:319) > at > org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:176) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1270) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1249) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1237) > ... > The web application source code does not directly call this method, so it is > an internal dependent call made by Apache PDFBox. > This is a runtime error only, no related errors observed during the build > process. > This issue first appears in Apache PDFBox 2.0.6 and is not present in Apache > PDFBox 2.0.5. > Current workaround is to downgrade Apache PDFBox to 2.0.5, which temporarily > solves the problem until the bundled Apache FontBox can be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org