Re: [DISCUSS] JBIG2-integration with JIRA or github

2017-11-01 Thread Maruan Sahyoun

> Am 01.11.2017 um 19:01 schrieb Andreas Lehmkuehler :
> 
> Am 01.11.2017 um 13:59 schrieb Maruan Sahyoun:
>> Hi,
>>> Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler :
>>> 
>>> Hi all,
>>> 
>>> the git-repository for the JBIG2 is online for a couple of days and we 
>>> haven't decided yet what kind of platform we want to integrate with the 
>>> repository.
>>> 
>>> PDFBox uses svn and integrates with JIRA, so that every checkin is 
>>> automatically linked to a JIRA-ticket (as long one adds the ticket number 
>>> to the commit comment).
>> the same is possible with git & svn. E.g. the documentation is using git. As 
>> long as you add the JIRA ticket number to the commit message it will link to 
>> JIRA.
>> See
>> https://issues.apache.org/jira/browse/PDFBOX-3330?focusedCommentId=16200067=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200067
>> as an example.
> That integration isn't active yet. We need to ask infra to do so.
> 
>>> 
>>> The question is, how should we proceed with the JBIG2 repo?
>>> Should we use JIRA as well to track bugs, improvements and any other kind 
>>> of requests?
>> +1
>>> Or should we use github and PRs to keep track of all changes?
>>> 
>> we can use PRs to if the include the ticket number.
>> Apache Camel is using git since quite some time. See 
>> https://github.com/apache/camel/blob/master/CONTRIBUTING.md#pull-request-at-github
>>  how to handle PRs linked to JIRA.
>>> I'm not really familiar with git (I know a handful of commands to update 
>>> our website), but github seems the natural choice for me.
>>> 
>> there is an even tighter integration with github now called gitbox. AFAIK 
>> Camle is moving to it as are some others
>> https://issues.apache.org/jira/browse/INFRA-15288
> Hmm, I've read about that but I don't understand the difference. Do you 
> know/can you explain which advantages/additional functions gitbox? Do we need 
> them too?

AFAIK the main benefit is how PRs can be merged.

Current approach: http://mahout.apache.org/developers/github.html
Approach with gitbox: http://opennlp.apache.org/using-git.html

So if we expect an active contribution through GitHub  gitbox will make it 
easier,

BR
Maruan

> 
>> BR
>> Maruan
>>> WDYT?
>>> 
>>> Andreas
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] JBIG2-integration with JIRA or github

2017-11-01 Thread Tilman Hausherr

Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler:

Hi all,

the git-repository for the JBIG2 is online for a couple of days and we 
haven't decided yet what kind of platform we want to integrate with 
the repository.


PDFBox uses svn and integrates with JIRA, so that every checkin is 
automatically linked to a JIRA-ticket (as long one adds the ticket 
number to the commit comment).


The question is, how should we proceed with the JBIG2 repo?
Should we use JIRA as well to track bugs, improvements and any other 
kind of requests?

Or should we use github and PRs to keep track of all changes?

I'm not really familiar with git (I know a handful of commands to 
update our website), but github seems the natural choice for me.


WDYT? 


I prefer JIRA but I'd like to have the new team members to be as 
comfortable as possible so lets hear from them. So I'm neutral on this one.


Tilman

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.

2017-11-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234527#comment-16234527
 ] 

Tilman Hausherr commented on PDFBOX-3970:
-

IIRC text extraction isn't done on annotations. ???

> x,y co-ordinates of the text inside the cell are not getting correctly.
> ---
>
> Key: PDFBOX-3970
> URL: https://issues.apache.org/jira/browse/PDFBOX-3970
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.7
> Environment: Operating system: Windows 7 (64 bit).
>Reporter: Navnath Kumbhar
>Priority: Major
>  Labels: how-to
> Attachments: formula-marked-34.png, 
> paragraphNextToTable-marked-1.png, paragraphNextToTable.pdf, 
> simpleAnnotation.pdf
>
>
> Hello Support Team,
> I am working on a project which parses a whole PDF document and stores the 
> extracted text in some .txt file which can be read by other product.
> My issue is regarding extracting the text inside the cell of a table: 
> *x,y co-ordinates of the text inside the cell are not getting correctly.*
> Y value of the last text line in the cell is getting larger than cell's max-Y 
> value.
> I have attached the test file with this bug.
> As you can see in the test document, there is one cell along-with text in it 
> and a text paragraph next to that cell.
> x-y coordinates that I get from pdfbox for all the paths (two vertical and 
> two horizontal lines) of the cell are:
> (in x1,y1,x2,y2 format)
> Horizontal line 1: [100,88,220,88]
> Horizontal line 2: [100,120,220,120]
> Vertical line 1 : [100,88,100,120]
> Vertical line 2: [220,88,220,120]
> (Y values of the above paths are final values by subtracting the actual value 
> given by pdfbox from height of the page as I see that for paths, y-values are 
> processed from bottom to up)
> And bounding box of the last line in that cell is : [102,114,59,7] and hence 
> max-Y of that line becomes 121 (min-Y + height)
>  
> So, if we consider max-Y value of that cell (i.e. 120)  and that of last line 
> in that cell (i.e. 121), clearly, that line goes out of that cell.
> What can be the possible reason for this?
> Thank you in advance!
> Regards,
> Navnath Kumbhar



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3986) Bounding box of mathematical symbols are not proper

2017-11-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234517#comment-16234517
 ] 

Tilman Hausherr commented on PDFBOX-3986:
-

[~lehmi] I disagree here. Yes the first two numbers are negative, but the sum 
of y and height is usually positive, but here it isn't.

[~navnath] What I meant is this: consider the glyphs "a" and "g". "a" is 
printed at the baseline. "g" has a part that is above and a part that is below 
the baseline. And the summation symbol is fully below the baseline, which IMHO 
is unusual.

> Bounding box of mathematical symbols are not proper
> ---
>
> Key: PDFBOX-3986
> URL: https://issues.apache.org/jira/browse/PDFBOX-3986
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.7
> Environment: Windows 7 (64 bit)
>Reporter: Navnath Kumbhar
>Priority: Major
> Attachments: PDFBOX-3986-reduced.pdf, formula-marked-34.png, 
> formula-marked-37.png, formula.pdf
>
>
> Hello Support Team,
> I am working on a task where I have to extract formulas from PDF document and 
> convert them into images.
> But when I extract them using PDFBox, some of the symbols like *Summation*, 
> *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line.
> I checked the output of DrawPrintTextLocations example with that particular 
> PDF document and result does not look normal.
> Red boxes are not aligned properly in the output as you will see in the 
> attachment files.
> I am, herewith, attaching the output of two pages and PDF document itself.
> *Please refer page no. 34 or 37 for this issue.*
> Thank you in advance!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3986) Bounding box of mathematical symbols are not proper

2017-11-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234493#comment-16234493
 ] 

Andreas Lehmkühler commented on PDFBOX-3986:


Those numbers are provided by the font itself. They are negative by design

> Bounding box of mathematical symbols are not proper
> ---
>
> Key: PDFBOX-3986
> URL: https://issues.apache.org/jira/browse/PDFBOX-3986
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.7
> Environment: Windows 7 (64 bit)
>Reporter: Navnath Kumbhar
>Priority: Major
> Attachments: PDFBOX-3986-reduced.pdf, formula-marked-34.png, 
> formula-marked-37.png, formula.pdf
>
>
> Hello Support Team,
> I am working on a task where I have to extract formulas from PDF document and 
> convert them into images.
> But when I extract them using PDFBox, some of the symbols like *Summation*, 
> *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line.
> I checked the output of DrawPrintTextLocations example with that particular 
> PDF document and result does not look normal.
> Red boxes are not aligned properly in the output as you will see in the 
> attachment files.
> I am, herewith, attaching the output of two pages and PDF document itself.
> *Please refer page no. 34 or 37 for this issue.*
> Thank you in advance!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] JBIG2-integration with JIRA or github

2017-11-01 Thread Andreas Lehmkuehler

Am 01.11.2017 um 13:59 schrieb Maruan Sahyoun:

Hi,


Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler :

Hi all,

the git-repository for the JBIG2 is online for a couple of days and we haven't 
decided yet what kind of platform we want to integrate with the repository.

PDFBox uses svn and integrates with JIRA, so that every checkin is 
automatically linked to a JIRA-ticket (as long one adds the ticket number to 
the commit comment).


the same is possible with git & svn. E.g. the documentation is using git. As 
long as you add the JIRA ticket number to the commit message it will link to JIRA.

See
https://issues.apache.org/jira/browse/PDFBOX-3330?focusedCommentId=16200067=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200067
as an example.


That integration isn't active yet. We need to ask infra to do so.



The question is, how should we proceed with the JBIG2 repo?
Should we use JIRA as well to track bugs, improvements and any other kind of 
requests?


+1


Or should we use github and PRs to keep track of all changes?



we can use PRs to if the include the ticket number.

Apache Camel is using git since quite some time. See 
https://github.com/apache/camel/blob/master/CONTRIBUTING.md#pull-request-at-github
 how to handle PRs linked to JIRA.


I'm not really familiar with git (I know a handful of commands to update our 
website), but github seems the natural choice for me.



there is an even tighter integration with github now called gitbox. AFAIK Camle 
is moving to it as are some others

https://issues.apache.org/jira/browse/INFRA-15288
Hmm, I've read about that but I don't understand the difference. Do you know/can 
you explain which advantages/additional functions gitbox? Do we need them too?




BR
Maruan



WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3986) Bounding box of mathematical symbols are not proper

2017-11-01 Thread Navnath Kumbhar (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234162#comment-16234162
 ] 

Navnath Kumbhar commented on PDFBOX-3986:
-

What do you mean by *_Font itself insists to do that?_*

> Bounding box of mathematical symbols are not proper
> ---
>
> Key: PDFBOX-3986
> URL: https://issues.apache.org/jira/browse/PDFBOX-3986
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.7
> Environment: Windows 7 (64 bit)
>Reporter: Navnath Kumbhar
>Priority: Major
> Attachments: PDFBOX-3986-reduced.pdf, formula-marked-34.png, 
> formula-marked-37.png, formula.pdf
>
>
> Hello Support Team,
> I am working on a task where I have to extract formulas from PDF document and 
> convert them into images.
> But when I extract them using PDFBox, some of the symbols like *Summation*, 
> *Integral*, or *Big Parenthesis* .etc are mixing up with its previous line.
> I checked the output of DrawPrintTextLocations example with that particular 
> PDF document and result does not look normal.
> Red boxes are not aligned properly in the output as you will see in the 
> attachment files.
> I am, herewith, attaching the output of two pages and PDF document itself.
> *Please refer page no. 34 or 37 for this issue.*
> Thank you in advance!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3970) x,y co-ordinates of the text inside the cell are not getting correctly.

2017-11-01 Thread Navnath Kumbhar (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navnath Kumbhar updated PDFBOX-3970:

Attachment: simpleAnnotation.pdf

Hello Tilman,

Thank you for pointing out the right code snippet. I have done some changes in 
the LegacyPDFStreamEngine.java

Below is my code change:


{code:java}
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, 
String unicode,
 Vector displacement) throws IOException
{
//
// legacy calculations which were previously in PDFStreamEngine
//
//  DO NOT USE THIS CODE UNLESS YOU ARE WORKING WITH PDFTextStripper.
//  THIS CODE IS DELIBERATELY INCORRECT
//

PDGraphicsState state = getGraphicsState();
Matrix ctm = state.getCurrentTransformationMatrix();
float fontSize = state.getTextState().getFontSize();
float horizontalScaling = state.getTextState().getHorizontalScaling() / 
100f;
Matrix textMatrix = getTextMatrix();

Shape glyphShape = getActualGlyphBoundingBox(textRenderingMatrix, font, 
code); 

BoundingBox bbox =  new 
BoundingBox((float)glyphShape.getBounds2D().getMinX(), 
(float)glyphShape.getBounds2D().getMinY(), 
(float)glyphShape.getBounds2D().getMaxX(), 
(float)glyphShape.getBounds2D().getMaxY());
if (bbox.getLowerLeftY() < Short.MIN_VALUE)
{
// PDFBOX-2158 and PDFBOX-3130
// files by Salmat eSolutions / ClibPDF Library
bbox.setLowerLeftY(- (bbox.getLowerLeftY() + 65536));
}
// 1/2 the bbox is used as the height todo: why?
float glyphHeight = bbox.getHeight()/2;

/*PDFontDescriptor fontDescriptor = font.getFontDescriptor();
if (fontDescriptor != null)
{
float capHeight = fontDescriptor.getCapHeight();
if (capHeight != 0 && (capHeight < glyphHeight || glyphHeight == 0))
{
glyphHeight = capHeight;
}
}*/

// transformPoint from glyph space -> text space
float height;
if (font instanceof PDType3Font)
{
height = font.getFontMatrix().transformPoint(0, glyphHeight).y;
}
else
{
height = glyphHeight / 1000;
}

.
.
.
}

{code}

And here is *getActualGlyphBoundingBox()* method.



{code:java}
   private Shape getActualGlyphBoundingBox(Matrix textRenderingMatrix, PDFont 
font, int code) throws IOException {
GeneralPath path = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
at.concatenate(font.getFontMatrix().createAffineTransform());
if (font instanceof PDType3Font)
{
PDType3Font t3Font = (PDType3Font) font;
PDType3CharProc charProc = t3Font.getCharProc(code);
if (charProc != null)
{
PDRectangle glyphBBox = charProc.getGlyphBBox();
if (glyphBBox != null)
{
path = glyphBBox.toGeneralPath();
}
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
path = vectorFont.getPath(code);

if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm = 
ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2) 
t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;

// these two lines do not always work, e.g. for the TT fonts in 
file 032431.pdf
// which is why PDVectorFont is tried first.
String name = simpleFont.getEncoding().getName(code);
path = simpleFont.getPath(name);
}
else
{
// shouldn't happen, please open issue in JIRA
System.out.println("Unknown font class: " + font.getClass());
}
if (path == null)
{
return null;
}  
   
   //return at.createTransformedShape(path.getBounds2D());   
return path.getBounds2D();
}
{code}


I am getting satisfactory results for text 

Re: [DISCUSS] JBIG2-integration with JIRA or github

2017-11-01 Thread Maruan Sahyoun
Hi,

> Am 01.11.2017 um 13:45 schrieb Andreas Lehmkuehler :
> 
> Hi all,
> 
> the git-repository for the JBIG2 is online for a couple of days and we 
> haven't decided yet what kind of platform we want to integrate with the 
> repository.
> 
> PDFBox uses svn and integrates with JIRA, so that every checkin is 
> automatically linked to a JIRA-ticket (as long one adds the ticket number to 
> the commit comment).

the same is possible with git & svn. E.g. the documentation is using git. As 
long as you add the JIRA ticket number to the commit message it will link to 
JIRA. 

See
https://issues.apache.org/jira/browse/PDFBOX-3330?focusedCommentId=16200067=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16200067
as an example.

> 
> The question is, how should we proceed with the JBIG2 repo?
> Should we use JIRA as well to track bugs, improvements and any other kind of 
> requests?

+1

> Or should we use github and PRs to keep track of all changes?
> 

we can use PRs to if the include the ticket number.

Apache Camel is using git since quite some time. See 
https://github.com/apache/camel/blob/master/CONTRIBUTING.md#pull-request-at-github
 how to handle PRs linked to JIRA. 

> I'm not really familiar with git (I know a handful of commands to update our 
> website), but github seems the natural choice for me.
> 

there is an even tighter integration with github now called gitbox. AFAIK Camle 
is moving to it as are some others 

https://issues.apache.org/jira/browse/INFRA-15288

BR
Maruan


> WDYT?
> 
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



RE: Running tika-eval on the Rackspace vm

2017-11-01 Thread Allison, Timothy B.
Sorry. Fixed.

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Tuesday, October 31, 2017 6:08 PM
To: dev@pdfbox.apache.org
Subject: Re: Running tika-eval on the Rackspace vm

Am 31.10.2017 um 20:53 schrieb Allison, Timothy B.:
>> It's not possible to rename / remove the files / directories mentioned in 
>> part 1 due to not having the permissions.
> Gah.  Sorry.  Tilman, I added you to "collab" and chgrp to collab on /work 
> /data2/docs /data3/batch_runs and /data4/batch_runs.

But the directories themselves don't have "w" rights for group so I can't 
profit from my membership... (unless I missed something, I haven't done much 
*nix since the 90ies) For example I can't rename 
/work/batch-apps/tika_working/logs to /work/batch-apps/tika_working/___logs .

Tilman


>
>> The directory is named batch-apps, not batch_apps.
> Fixed.  Thank you.
>
>> Re the "A" version - is this the "good" version, so I could simply  download 
>> tika-app and put it there? Or just build tika with a specific  PDFBox 
>> version?
> If the current version of tika-app has the right version of PDFBox for your 
> "before" examples, then y, you can just download tika-app.jar.  We release 
> less frequently than PDFBox, so it's possible that you'll want to build from 
> scratch with the most recent previous release of PDFBox.
>
> In my mind, A is the "before/baseline" version and B is the 
> SNAPSHOT/RC version.  So, hopefully, B is the "good" one. 
>
> Let me know what other problems you encounter.
>
> Cheers,
>
>   Tim
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For 
> additional commands, e-mail: dev-h...@pdfbox.apache.org
>


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional 
commands, e-mail: dev-h...@pdfbox.apache.org




[DISCUSS] JBIG2-integration with JIRA or github

2017-11-01 Thread Andreas Lehmkuehler

Hi all,

the git-repository for the JBIG2 is online for a couple of days and we haven't 
decided yet what kind of platform we want to integrate with the repository.


PDFBox uses svn and integrates with JIRA, so that every checkin is automatically 
linked to a JIRA-ticket (as long one adds the ticket number to the commit comment).


The question is, how should we proceed with the JBIG2 repo?
Should we use JIRA as well to track bugs, improvements and any other kind of 
requests?

Or should we use github and PRs to keep track of all changes?

I'm not really familiar with git (I know a handful of commands to update our 
website), but github seems the natural choice for me.


WDYT?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.8

2017-11-01 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 30.10.2017 um 19:47 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.8 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.8/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.8/

The SHA1 checksum of the archive is 
5c0607144dde1b7af3dd428cafbd2c9c29617ab3.


Please vote on releasing this package as Apache PDFBox 2.0.8.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.8
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3985) IOException thrown from org.apache.fontbox.ttf.CMAPEncodingEntry.processSubtype14

2017-11-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233711#comment-16233711
 ] 

Tilman Hausherr commented on PDFBOX-3985:
-

The latest one brings just a log message and no exception.

> IOException thrown from 
> org.apache.fontbox.ttf.CMAPEncodingEntry.processSubtype14
> -
>
> Key: PDFBOX-3985
> URL: https://issues.apache.org/jira/browse/PDFBOX-3985
> Project: PDFBox
>  Issue Type: Improvement
>  Components: FontBox
>Affects Versions: 2.0.7
>Reporter: Tomonori Soejima
>Priority: Minor
>
> I ran into this issue while processing a pdf file through elasticsearch and 
> it turns out that the error was because [the method is not implemented|
> https://apache.googlesource.com/pdfbox/+/refs/heads/trunk/fontbox/src/main/java/org/apache/fontbox/ttf/CmapSubtable.java#327]
>  
> Below is an a snippet of stack trace I ran into.
> Is there any plan to implementing this method?
> An error occured when reading table cmap
> java.io.IOException: CMap subtype 14 not yet implemented
> at 
> org.apache.fontbox.ttf.CMAPEncodingEntry.processSubtype14(CMAPEncodingEntry.java:304)
> at 
> org.apache.fontbox.ttf.CMAPEncodingEntry.initSubtable(CMAPEncodingEntry.java:114)
> at org.apache.fontbox.ttf.CMAPTable.initData(CMAPTable.java:100)
> at 
> org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280)
> at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128)
> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80)
> at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109)
> at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
> at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84)
> at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getTTFFont(PDTrueTypeFont.java:632)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:673)
> at 
> org.apache.pdfbox.pdmodel.font.PDSimpleFont.getFontWidth(PDSimpleFont.java:231)
> at 
> org.apache.pdfbox.pdmodel.font.PDSimpleFont.getSpaceWidth(PDSimpleFont.java:533)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:458)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:383)
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:342)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:148)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:148)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.Tika.parseToString(Tika.java:537)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3987) Apache PDFBox {2.0.6,2.0.7} java.lang.NoSuchMethodError: org.apache.fontbox.ttf.TrueTypeFont.getOriginalDataSize()

2017-11-01 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233706#comment-16233706
 ] 

Tilman Hausherr commented on PDFBOX-3987:
-

It is definitively in 2.0.7 but not in 2.0.6. Please check your class path, it 
should have only one version.

> Apache PDFBox {2.0.6,2.0.7} java.lang.NoSuchMethodError: 
> org.apache.fontbox.ttf.TrueTypeFont.getOriginalDataSize()
> --
>
> Key: PDFBOX-3987
> URL: https://issues.apache.org/jira/browse/PDFBOX-3987
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.6, 2.0.7
> Environment: Oracle Linux Server 6.8, Sun/Oracle Java SE JDK 
> 1.8.0_141, jetty-8.1.12.v20130726, ActiveWeb-1.15
>Reporter: Sergei Haramundanis
>Priority: Major
>
> The following exception occurs during PDF generation using Apache PDFBox. It 
> appears to be caused because Apache PDFBox {2.0.7,2.0.6} is bundled with and 
> uses dependent library Apache FontBox org.apache.pdfbox:fontbox:bundle:2.0.7, 
> of which class org.apache.fontbox.ttf.TrueTypeFont does not include the 
> implementation for getOriginalDataSize(), although it is documented in the 
> API docs.
> java.lang.NoSuchMethodError: 
> org.apache.fontbox.ttf.TrueTypeFont.getOriginalDataSize()J
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.buildFontFile2(TrueTypeEmbedder.java:117)
> at 
> org.apache.pdfbox.pdmodel.font.PDCIDFontType2Embedder.buildSubset(PDCIDFontType2Embedder.java:106)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:319)
> at 
> org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:176)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1270)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1249)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1237)
> ...
> The web application source code does not directly call this method, so it is 
> an internal dependent call made by Apache PDFBox.
> This is a runtime error only, no related errors observed during the build 
> process.
> This issue first appears in Apache PDFBox 2.0.6 and is not present in Apache 
> PDFBox 2.0.5.
> Current workaround is to downgrade Apache PDFBox to 2.0.5, which temporarily 
> solves the problem until the bundled Apache FontBox can be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org