[
https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997389#comment-14997389
]
John Hewson commented on PDFBOX-3062:
-
That's good news.
> Text ex
3044:
-
Commit 1713474 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1713474 ]
PDFBOX-3044: add *txt files to rat config
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://issues.
[
https://issues.apache.org/jira/browse/PDFBOX-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3044:
Component/s: Text extraction
> Improve text extraction te
yway, probably due to the recent change that the
fontBBox is taken from the dictionary and not from the font. I have not done a
change to use CapHeigth. This is done in 1.8 only.
> Text extraction and height different in 2.0
> ---
>
>
e noticed yesterday that many files I had marked for
further review are now good.
{quote}
they are with the change to use CapHeigth etc. or they are already good anyway?
> Text extraction and height different in 2.0
> ---
>
>
the result is that text that is on different
lines is extracted as being on the same line.
Yes we could of course get a real BBox by getting through the glyphs like I
recently did for type 3 fonts. But that would make text extraction slower.
At this time I'm not saying that anything shoul
o get the bounds of the GeneralPath if we want visual bounds.
was (Author: jahewson):
Those BBox values are pretty reasonable though, certainly not implausible.
Neither CapHeight nor XHeight make sense as substitutes for BBox - we know
those values will always be smaller.
> Text extractio
able though, certainly not implausible.
Neither CapHeight nor XHeight make sense as substitutes for BBox - we know
those values will always be smaller.
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
>
values are more realistic than the FontBBox values, which
too large in the font of this file.
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
> URL: https://issues.apache.org/jira/bro
3044:
-
Commit 1713127 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1713127 ]
PDFBOX-3044: add *txt files to rat config
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://
3062:
-
Commit 1713117 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1713117 ]
PDFBOX-3062: add test files
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
> URL: h
ably PDFBOX-3078) text extraction of
PDFBOX-3062-N2MOQ7YZICIYGTPLQJAWJ4HLN6CCEMHZ-reduced.pdf is now good:
{quote}
Fraternity Members 480 Male Undergraduates 3495
Sorority Members 484 Female Undergraduates 4880
{quote}
> Text extraction and height different
3044:
-
Commit 1713114 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1713114 ]
PDFBOX-3044: change encoding to utf8, don't fail immediately
> Improve text extraction tests
> -
>
> Key: PDFBOX-30
3044:
-
Commit 1713113 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1713113 ]
PDFBOX-3044: change encoding to utf8
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://issues.
your application as far as type 3 heights are
concerned, because [ https://svn.apache.org/r1711758 ] will make all type 3
heights slightly smaller.
> Text extraction getting zero font height, bad widths, and ? for text in this
> PDF with Ty
1.8.11
This resolves your issue. There might still be files with zero type 3 height,
see PDFBOX-3076. And despite resolving, I'd like to hear how you got non-zero
in your initial post.
> Text extraction getting zero font height, bad widths, and ? for text in this
> PDF
5996 space=6.1737375
width=6.0899963]?
String[140.72041,299.28 fs=58.0 xscale=58.0 height=5.1155996 space=6.1737375
width=3.1667938]?
String[522.95984,293.28 fs=58.0 xscale=58.0 height=0.9744 space=2.6529562
width=1.4616089]?
{code}
> Text extraction getting zero font height, bad widths, and ? f
9562
width=1.4616089]?
{code}
> Text extraction getting zero font height, bad widths, and ? for text in this
> PDF with Type 3 Fonts
> --
>
> Key: PDFBOX-2508
>
2508:
-
Commit 1711765 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1711765 ]
PDFBOX-2508: get font height from FontBBox item if existing method fails
> Text extraction getting zero font height, bad widths, and ? for text in this
> P
2508:
-
Commit 1711758 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1711758 ]
PDFBOX-2508: fix bug in construction of font BoundingBox from PDRectangle
> Text extraction getting zero font height, bad widths, and ? for text in this
> P
[
https://issues.apache.org/jira/browse/PDFBOX-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-2508:
Affects Version/s: 2.0.0
> Text extraction getting zero font height, bad widths, and ?
height=0.9744 space=2.6529562
width=1.4616089]?
String[129.84,347.04 fs=58.0 xscale=58.0 height=5.3592 space=5.0880046
width=2.6796112]?
String[211.92,356.8801 fs=58.0 xscale=58.0 height=3.654 space=3.3609185
width=1.7052002]?
{code}
> Text extraction getting zero font height, bad widths,
2508:
-
Commit 1711714 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1711714 ]
PDFBOX-2508: correct calculation of glyphSpaceToTextSpaceFactor, remove
misleading comment
> Text extraction getting zero font height, bad widths, and ? for text in this
> P
2508:
-
Commit 1711701 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1711701 ]
PDFBOX-2508: correct calculation of glyphSpaceToTextSpaceFactor, remove
misleading comment
> Text extraction getting zero font height, bad widths, and ? for text in this
> P
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980001#comment-14980001
]
John Hewson commented on PDFBOX-3053:
-
Ha! :)
> Text extraction fails with
at!
See also my reply to [this
thread|http://mail-archives.apache.org/mod_mbox/pdfbox-users/201510.mbox/%3ccaaphlv-0z+3ssvpxi8bwvbbqrf-vthkajigwxfedbb3vke_...@mail.gmail.com%3e]
Enjoy!
> Text extraction and height different in 2.0
> ---
>
>
I've deprecated PDFont#getHeight(). It wasn't
used anyway.
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
> URL: https://issues.apache.org/jira/browse/PDFBOX-3062
>
3062:
-
Commit 1711181 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1711181 ]
PDFBOX-3062: Deprecate PDFont#getHeight()
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
&g
[
https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3062:
Attachment: PDFBOX-3062-N2MOQ7YZICIYGTPLQJAWJ4HLN6CCEMHZ-reduced.pdf
> Text extraction
3062-N2MOQ7YZICIYGTPLQJAWJ4HLN6CCEMHZ-reduced.pdf has the same problem,
but with the opposite effect - due to a bad height stuff that is on separate
lines is put together. (Which results in an incredible mess when using the sort
option)
> Text extraction and height different
FBOX-3067
> Bad space calculation in text extraction
>
>
> Key: PDFBOX-3042
> URL: https://issues.apache.org/jira/browse/PDFBOX-3042
> Project: PDFBox
> Issue Type: Bug
>
3042:
-
Commit 1711070 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1711070 ]
PDFBOX-3042: don't multiply with horizontalScalingText, as this has already
been done before
> Bad space calculation in text extraction
>
>
&
hlv-0z+3ssvpxi8bwvbbqrf-vthkajigwxfedbb3vke_...@mail.gmail.com%3e]
Enjoy!
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
> URL: https://issues.apache.org/jira/browse/PDFBOX-3062
&g
ent font size (with
the TM + CTM taken into account). Note that the textRenderingMatrix (TRM)
passed to onGlyph already has all of these calculations done for you... so use
that!
Enjoy!
> Text extraction and height different in 2.0
> ---
>
>
(TRM)
passed to onGlyph already has all of these calculations done for you... so use
that!
Enjoy!
> Text extraction and height different in 2.0
> ---
>
> Key: PDFBOX-3062
> URL: https://issues.apache.org/jira/b
needed.
> Text extraction reports zero character widths
> --
>
> Key: PDFBOX-2584
> URL: https://issues.apache.org/jira/browse/PDFBOX-2584
> Project: PDFBox
> Issue Type: Bug
[
https://issues.apache.org/jira/browse/PDFBOX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3062:
Attachment: 005021-reduced.pdf
> Text extraction and height different in
Tilman Hausherr created PDFBOX-3062:
---
Summary: Text extraction and height different in 2.0
Key: PDFBOX-3062
URL: https://issues.apache.org/jira/browse/PDFBOX-3062
Project: PDFBox
Issue
3044:
-
Commit 1710510 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710510 ]
PDFBOX-3044: delete possible leftover diff file
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Issue Type: Sub-task (was: Bug)
Parent: PDFBOX-3058
> Text extraction fails w
here the cweb test came from, or what it was meant to test, it
is probably from long ago. But I am trying to add only small, one page tests on
any new PDFs I add.
{quote}
The cweb file is part of the test files from the beginning. AFAIK it's simply a
test for text extraction in general an
ext extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
> URL: https://issues.apache.org/jira/browse/PDFBOX-3053
> Project: PDFBox
> Issue Type: Bug
> Components: Text
3053:
-
Commit 1710376 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710376 ]
PDFBOX-3053: use Adobe glyph list, not Zapf glyph list
> Text extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
&g
3053:
-
Commit 1710374 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710374 ]
PDFBOX-3053: set to text/plain
> Text extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
> URL: https://
3053:
-
Commit 1710373 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710373 ]
PDFBOX-3053: add test files
> Text extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
> URL: https://
ncoding()
{code}
glyphList = glyphList = GlyphList.getZapfDingbats();
{code}
> Text extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
> URL: https://issues.apache.org/jira/browse/PDFBOX-3053
>
4 PM:
---
>From PDType3Font.readEncoding()
{code}
glyphList = GlyphList.getZapfDingbats();
{code}
was (Author: tilman):
>From PDType3Font.readEncoding()
{code}
glyphList = glyphList = GlyphList.getZapfDingbats();
{code}
> Text extraction fails with
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Attachment: PDFBOX-3053-reduced.pdf
> Text extraction fails with type 3 fo
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Attachment: (was: PDFBOX-3053-reduced.pdf)
> Text extraction fails with type 3 fo
on /w /b /j /v /comma /k /period]
{code}
2959 file:
{code}
/Differences [32 /space 69 /E /F 72 /H /I 78 /N /O 82 /R 84 /T /U]
{code}
> Text extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
> URL: https
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Attachment: PDFBOX-3053-reduced.pdf
> Text extraction fails with type 3 fo
/i
/slash /f /colon /w /b /j /v /comma /k /period]
{code}
2959 file:
{code}
/Differences [32 /space 69 /E /F 72 /H /I 78 /N /O 82 /R 84 /T /U]
{code}
> Text extraction fails with type 3 fonts
> ---
>
> Key: PDFBOX-3053
&g
st came from, or what it was meant to test, it
is probably from long ago. But I am trying to add only small, one page tests on
any new PDFs I add.
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL:
ower
the number that can fail in the future to prevent regressions
* Make the files roughly equivalent in length. cweb.pdf is 28 pages long and
all the rest are 1 page, so the test output is almost entirely dominated by
whether we make this file better or worse
> Improve text extractio
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Description:
Text extraction fails with the attached file. It succeeds with Acrobat Reader
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Attachment: PDFBOX-2959-reduced.pdf
> Text extraction fails with type 3 fo
[
https://issues.apache.org/jira/browse/PDFBOX-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3053:
Attachment: PDFBOX-3053-3YQ2UXRQBBLX5TLKSLFCUZLWXWSI2Z2U.pdf
> Text extraction fails w
Tilman Hausherr created PDFBOX-3053:
---
Summary: Text extraction fails with type 3 fonts
Key: PDFBOX-3053
URL: https://issues.apache.org/jira/browse/PDFBOX-3053
Project: PDFBox
Issue Type
e us some more details what exactly is the issue here?
> Text extraction reports zero character widths
> --
>
> Key: PDFBOX-2584
> URL: https://issues.apache.org/jira/browse/PDFBOX-2584
>
fixed in
1.8.11 as well? Pavel is complaining about 1.8.8. I'm going to check that later
> Text extraction reports zero character widths
> --
>
> Key: PDFBOX-2584
> URL: https://is
pace=2.2251122
width=5.7788696]N
> Text extraction reports zero character widths
> --
>
> Key: PDFBOX-2584
> URL: https://issues.apache.org/jira/browse/PDFBOX-2584
> Project: P
u as desired?
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://issues.apache.org/jira/browse/PDFBOX-3044
> Project: PDFBox
> Issue Type: Bug
>Affects Ver
3044:
-
Commit 1710270 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710270 ]
PDFBOX-3044: set to text/plan
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://issues.apach
3044:
-
Commit 1710247 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710247 ]
PDFBOX-3044: change encoding to utf8, don't fail immediately; output diff
output; use diff library; update test files to utf8
> Improve
3044:
-
Commit 1710250 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710250 ]
PDFBOX-3044: change encoding to utf8, don't fail immediately; output diff
output; use diff library; update test files to utf8
> Improve
seem to be UTF16 encoded. I'm
having a really difficult time using these files with the tools that I
typically use (git, meld, etc.) Would it be possible to change the encoding to
UTF8?
By @Tilman Hausherr
I'm expanding this as a long term issue to improve the testing of text
extract
[
https://issues.apache.org/jira/browse/PDFBOX-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3044:
Summary: Improve text extraction tests (was: Test files character encoding)
> Impr
3044:
-
Commit 1710241 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710241 ]
PDFBOX-3044: add diffutils lib for test only
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://
> Improve text extraction tests
> -
>
> Key: PDFBOX-3044
> URL: https://issues.apache.org/jira/browse/PDFBOX-3044
> Project: PDFBox
> Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 2.0.0
&g
3042:
-
Commit 1710057 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1710057 ]
PDFBOX-3042: remove dead code
> Bad space calculation in text extraction
>
>
> Key: PDFBOX-3042
> URL: h
nt matrix is \[0.001 0 0 0.001 0 0].
{quote}
With the file from PDFBOX-2794, 1 / 0.001 = 1000. And that is multiplied with
the space width 277.832, so the base value is 277832! A Tf value of 8 means
that the size is now 656.
> Text extraction getting zero font height, bad widths, and ? for text
32! A Tf value of 8 means
that the size is now 656.
> Text extraction getting zero font height, bad widths, and ? for text in this
> PDF with Type 3 Fonts
> --
>
>
[
https://issues.apache.org/jira/browse/PDFBOX-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-2508:
Labels: type3 (was: )
> Text extraction getting zero font height, bad widths, and ?
ible to depend less on the
averageCharWidth in PDFTextStripper since it seemed like it was at least
partially a workaround for that issue
> Text extraction broken for jbl example
> --
>
> Key: PDFBOX-3028
>
ring[92.585,79.52399 fs=9.3624 xscale=9.268776 height=7.961241
space=5.5056534 width=5.561264]C
{code}
After PDFBOX-3042.
> Text extraction broken for jbl example
> --
>
> Key: PDFBOX-3028
> URL: https://issues.apache.org
9 fs=9.3624 xscale=9.268776 height=7.961241
space=5.5056534 width=5.561264]C
{code}
After PDFBOX-3042.
> Text extraction broken for jbl example
> --
>
> Key: PDFBOX-3028
> URL: https://issues.apache.org/jira
399 fs=9.3624 xscale=9.268776 height=5.302922
space=51.546127 width=5.561264] {code}
> Text extraction broken for jbl example
> --
>
> Key: PDFBOX-3028
> URL: https://issues.apache.org/jira/browse/PDFBOX-3028
>
[
https://issues.apache.org/jira/browse/PDFBOX-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved PDFBOX-3042.
-
Resolution: Fixed
> Bad space calculation in text extract
3042:
-
Commit 1709886 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709886 ]
PDFBOX-3042: add test files
> Bad space calculation in text extraction
>
>
> Key: PDFBOX-3042
> URL: https://
3042:
-
Commit 1709883 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709883 ]
PDFBOX-3042: don't multiply with fontSize, as this has already been done before
> Bad space calculation in text extraction
>
>
>
> Bad space calculation in text extraction
>
>
> Key: PDFBOX-3042
> URL: https://issues.apache.org/jira/browse/PDFBOX-3042
> Project: PDFBox
> Issue Type: Bug
> Com
Tilman Hausherr created PDFBOX-3042:
---
Summary: Bad space calculation in text extraction
Key: PDFBOX-3042
URL: https://issues.apache.org/jira/browse/PDFBOX-3042
Project: PDFBox
Issue Type
to check the 1.8.x branch yesterday. Maybe it's fixed in
1.8.11 as well? Pavel is complaining about 1.8.8. I'm going to check that later
> Text extraction reports zero character widths
> --
>
> Key: PDFBOX-25
04 height=5.326662 space=2.2251122
width=5.7788696]N
String[624.0,213.18 fs=1.0 xscale=8.004 height=5.326662 space=2.2251122
width=5.7788696]N
{code}
So what is the problem here?
> Text extraction reports zero character widths
> --
>
>
[
https://issues.apache.org/jira/browse/PDFBOX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler updated PDFBOX-2584:
---
Affects Version/s: (was: 2.0.0)
> Text extraction reports zero character wid
22
width=4.4501953]8
String[624.0,213.18 fs=1.0 xscale=8.004 height=5.324927 space=2.2251122
width=5.7788696]N
> Text extraction reports zero character widths
> --
>
> Key: PDFBOX-2584
> URL: https://issue
[
https://issues.apache.org/jira/browse/PDFBOX-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved PDFBOX-3038.
-
Resolution: Fixed
Assignee: Tilman Hausherr
Setting to resolved;
- Text extraction
.913 space=20.25 width=2.25]
{code}
> Text extraction shows glyphs with zero height
> -
>
> Key: PDFBOX-3038
> URL: https://issues.apache.org/jira/browse/PDFBOX-3038
> Project: PDFBox
>
ight (speech by FTC head).
> Text extraction shows glyphs with zero height
> -
>
> Key: PDFBOX-3038
> URL: https://issues.apache.org/jira/browse/PDFBOX-3038
> Project: PDFBox
>
3038:
-
Commit 1709647 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709647 ]
PDFBOX-3038: return BBox from font descriptor if font BBox empty
> Text extraction shows glyphs with zero height
> -
>
>
3038:
-
Commit 1709646 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709646 ]
PDFBOX-3038: add test files
> Text extraction shows glyphs with zero height
> -
>
> Key: PDFBOX-3038
> U
[
https://issues.apache.org/jira/browse/PDFBOX-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3038:
Attachment: PDFBOX-3038-001033-p2.pdf
> Text extraction shows glyphs with zero hei
[
https://issues.apache.org/jira/browse/PDFBOX-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved PDFBOX-3037.
-
Resolution: Fixed
> Text extraction decodes image fi
3037:
-
Commit 1709640 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709640 ]
PDFBOX-3037: check for image to avoid decoding them when doing text extraction
> Text extraction decodes image files
> ---
>
>
3037:
-
Commit 1709639 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709639 ]
PDFBOX-3037: add DrawObject method for content extractor engine
> Text extraction decodes image files
> ---
>
> Key: PDFBOX-3037
&g
[
https://issues.apache.org/jira/browse/PDFBOX-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3038:
Labels: regression (was: )
> Text extraction shows glyphs with zero hei
Tilman Hausherr created PDFBOX-3038:
---
Summary: Text extraction shows glyphs with zero height
Key: PDFBOX-3038
URL: https://issues.apache.org/jira/browse/PDFBOX-3038
Project: PDFBox
Issue
Tilman Hausherr created PDFBOX-3037:
---
Summary: Text extraction decodes image files
Key: PDFBOX-3037
URL: https://issues.apache.org/jira/browse/PDFBOX-3037
Project: PDFBox
Issue Type: Bug
[
https://issues.apache.org/jira/browse/PDFBOX-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3037:
Attachment: 001131.pdf
> Text extraction decodes image fi
rted, 1.8 supports RC4, and 2.0 supports also
AES.
> Text Extraction on Android
> --
>
> Key: PDFBOX-586
> URL: https://issues.apache.org/jira/browse/PDFBOX-586
> Project: PDFBox
> Issue Type: Improvem
401 - 500 of 1060 matches
Mail list logo