[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591829#comment-14591829
]
Andreas Meier commented on PDFBOX-2831:
---
There are several positions in the File,
[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2831:
--
Attachment: chya31marked.jpg
ArrayIndexOutOfBoundsException in mergeDiacritic() on extraction
[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595410#comment-14595410
]
Andreas Meier commented on PDFBOX-2831:
---
Thanks
ArrayIndexOutOfBoundsException in
[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593059#comment-14593059
]
Andreas Meier commented on PDFBOX-2831:
---
Personally, I don't speak any arabic
[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593059#comment-14593059
]
Andreas Meier edited comment on PDFBOX-2831 at 6/19/15 6:05 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591447#comment-14591447
]
Andreas Meier commented on PDFBOX-2831:
---
I had to search for a file on the web,
[
https://issues.apache.org/jira/browse/PDFBOX-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2831:
--
Description:
PDFBox may fail on extraction of text in method mergeDiacritic(TextPosition
Andreas Meier created PDFBOX-2831:
-
Summary: ArrayIndexOutOfBoundsException in mergeDiacritic() on
extraction of text with diacritic text
Key: PDFBOX-2831
URL: https://issues.apache.org/jira/browse/PDFBOX-2831
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638657#comment-14638657
]
Andreas Meier commented on PDFBOX-2272:
---
The patch
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640162#comment-14640162
]
Andreas Meier commented on PDFBOX-2272:
---
I did not attach the vertical.patch
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2272:
--
Attachment: (was: PDFTextStripper.java)
Can't extract vertical text correctly
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2272:
--
Attachment: pdfbox_new_vertical_text_extraction.patch
Can't extract vertical text correctly
Andreas Meier created PDFBOX-2879:
-
Summary: Wrong vertical text extraction for apache PDFBox 2.0.0
Key: PDFBOX-2879
URL: https://issues.apache.org/jira/browse/PDFBOX-2879
Project: PDFBox
[
https://issues.apache.org/jira/browse/PDFBOX-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2879:
--
Attachment: Test16.pdf
Test15.pdf
Test14.pdf
Wrong vertical
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2272:
--
Attachment: PDFTextStripper.java
PDFTextStripper.java that supports extraction of rotated
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625868#comment-14625868
]
Andreas Meier commented on PDFBOX-2272:
---
I did some small changes to the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2252:
--
Attachment: PDFTextStripper.java.patch
PDFTextStripper has problem with bilingual documents
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629760#comment-14629760
]
Andreas Meier commented on PDFBOX-2252:
---
I am currently reworking the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629760#comment-14629760
]
Andreas Meier edited comment on PDFBOX-2252 at 7/16/15 1:56 PM:
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629782#comment-14629782
]
Andreas Meier commented on PDFBOX-2272:
---
Please have a look at Ticket PDFBOX-2252
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630834#comment-14630834
]
Andreas Meier edited comment on PDFBOX-2252 at 7/17/15 6:14 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630834#comment-14630834
]
Andreas Meier commented on PDFBOX-2252:
---
After some investigation I can say:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630834#comment-14630834
]
Andreas Meier edited comment on PDFBOX-2252 at 7/17/15 8:00 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2252:
--
Attachment: atest.pdf
PDFTextStripper has problem with documents with mixed language
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2252:
--
Attachment: wikipedia_dl_lyric_test.pdf
PDFTextStripper has problem with documents with mixed
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633613#comment-14633613
]
Andreas Meier commented on PDFBOX-2252:
---
I created a small test: atest.pdf
By the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2252:
--
Comment: was deleted
(was: The (( occurs, becaus Adobe Reader notices the strong RTL
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634577#comment-14634577
]
Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 6:23 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634582#comment-14634582
]
Andreas Meier commented on PDFBOX-2252:
---
The (( occurs, becaus Adobe Reader notices
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2252:
--
Attachment: overlap.jpg
PDFTextStripper has problem with documents with mixed language
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634565#comment-14634565
]
Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 6:14 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634577#comment-14634577
]
Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 9:43 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634577#comment-14634577
]
Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 9:43 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634577#comment-14634577
]
Andreas Meier commented on PDFBOX-2252:
---
Yes, numbers are written ltr
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634577#comment-14634577
]
Andreas Meier edited comment on PDFBOX-2252 at 7/21/15 6:24 AM:
[
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628012#comment-14628012
]
Andreas Meier commented on PDFBOX-2272:
---
Sorry that I didn't answer yesterday, I
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946604#comment-14946604
]
Andreas Meier edited comment on PDFBOX-2998 at 10/7/15 9:54 AM:
Thanks
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2998:
--
Attachment: DropCapSegmentation.jpg
DropCapExample5.pdf
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946604#comment-14946604
]
Andreas Meier edited comment on PDFBOX-2998 at 10/7/15 9:58 AM:
Thanks
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946604#comment-14946604
]
Andreas Meier commented on PDFBOX-2998:
---
Thanks for pointing that out.
> Enhance the text
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946324#comment-14946324
]
Andreas Meier commented on PDFBOX-2998:
---
I just wanted to fuel the discussion with my snippet.
My
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946324#comment-14946324
]
Andreas Meier edited comment on PDFBOX-2998 at 10/7/15 6:03 AM:
I just
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939764#comment-14939764
]
Andreas Meier commented on PDFBOX-2998:
---
I would neither call it "line finding" nor "proper text to
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2998:
--
Attachment: TextBehindText.pdf
> Enhance the text extraction capabilities
>
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939843#comment-14939843
]
Andreas Meier commented on PDFBOX-2998:
---
You are right, first we need to get the lower level
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910107#comment-14910107
]
Andreas Meier edited comment on PDFBOX-2252 at 9/28/15 6:56 AM:
I tested
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910107#comment-14910107
]
Andreas Meier commented on PDFBOX-2252:
---
I tested the latest patch with the documents.
There are
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933076#comment-14933076
]
Andreas Meier commented on PDFBOX-2252:
---
Most of the files I see do not provide any article
Andreas Meier created PDFBOX-2998:
-
Summary: Document layout analysis tools needed
Key: PDFBOX-2998
URL: https://issues.apache.org/jira/browse/PDFBOX-2998
Project: PDFBox
Issue Type: New
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933277#comment-14933277
]
Andreas Meier commented on PDFBOX-2252:
---
I have already studied one of the two papers you posted
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933277#comment-14933277
]
Andreas Meier edited comment on PDFBOX-2252 at 9/28/15 1:23 PM:
I have
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906244#comment-14906244
]
Andreas Meier commented on PDFBOX-2252:
---
In fact I don't know if there are some build-in classes
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905915#comment-14905915
]
Andreas Meier edited comment on PDFBOX-2252 at 9/24/15 6:59 AM:
Even if
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905915#comment-14905915
]
Andreas Meier edited comment on PDFBOX-2252 at 9/24/15 7:00 AM:
Even if
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905915#comment-14905915
]
Andreas Meier commented on PDFBOX-2252:
---
Even if the patch doesn't look like it, I had a hard time
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2252:
--
Attachment: BidiMirroring.txt
PDFTextStripper.java.patch
> PDFTextStripper has
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904040#comment-14904040
]
Andreas Meier commented on PDFBOX-2252:
---
I want provide a new patch to address this Problem,
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944999#comment-14944999
]
Andreas Meier commented on PDFBOX-2998:
---
The question is, when is a group of textpositions forming
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944999#comment-14944999
]
Andreas Meier edited comment on PDFBOX-2998 at 10/6/15 1:09 PM:
The
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944999#comment-14944999
]
Andreas Meier edited comment on PDFBOX-2998 at 10/6/15 1:14 PM:
The
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2998:
--
Comment: was deleted
(was: I think it is the right place to comment.
Writing an algorithm to
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-2998:
--
Comment: was deleted
(was: I think it is the right place to comment.
Writing an algorithm to
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062122#comment-15062122
]
Andreas Meier commented on PDFBOX-2998:
---
I think it is the right place to comment.
Writing an
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062121#comment-15062121
]
Andreas Meier commented on PDFBOX-2998:
---
I think it is the right place to comment.
Writing an
[
https://issues.apache.org/jira/browse/PDFBOX-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062124#comment-15062124
]
Andreas Meier commented on PDFBOX-2998:
---
I think it is the right place to comment.
Writing an
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389166#comment-16389166
]
Andreas Meier edited comment on PDFBOX-4141 at 3/7/18 7:23 AM:
---
{quote}What
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389166#comment-16389166
]
Andreas Meier commented on PDFBOX-4141:
---
{quote}What is the meaning of the table columns? Convert
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400154#comment-16400154
]
Andreas Meier commented on PDFBOX-4141:
---
The last few days I searched for files like the one that
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier closed PDFBOX-4141.
-
Resolution: Feedback Received
> Suppress control characters?
>
>
>
Andreas Meier created PDFBOX-4141:
-
Summary: Suppress control characters?
Key: PDFBOX-4141
URL: https://issues.apache.org/jira/browse/PDFBOX-4141
Project: PDFBox
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-4141:
--
Attachment: Test_without_MW.txt
Test_with_MW_linux.jpg
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Meier updated PDFBOX-4141:
--
Attachment: Mapping_default_to_adobe.csv
> Suppress control characters?
>
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387439#comment-16387439
]
Andreas Meier edited comment on PDFBOX-4141 at 3/6/18 8:13 AM:
---
Thanks for
[
https://issues.apache.org/jira/browse/PDFBOX-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387439#comment-16387439
]
Andreas Meier commented on PDFBOX-4141:
---
Thanks for the info Tilman.
Overriding the characters in
74 matches
Mail list logo