[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312597#comment-15312597
]
Tilman Hausherr commented on PDFBOX-2252:
-
I'm neutral on this, but the any new issue should
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312109#comment-15312109
]
Maruan Sahyoun commented on PDFBOX-2252:
+1
> PDFTextStripper has problem with documents with
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312092#comment-15312092
]
Andreas Lehmkühler commented on PDFBOX-2252:
What is the status of this ticket? AFAIU
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083519#comment-15083519
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1723145 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989943#comment-14989943
]
Tilman Hausherr commented on PDFBOX-2252:
-
A fourth mixed document can be found here:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971210#comment-14971210
]
Maruan Sahyoun commented on PDFBOX-2252:
Thank you very much for your effort!
> PDFTextStripper
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971238#comment-14971238
]
Maruan Sahyoun commented on PDFBOX-2252:
>From a quick scan the results for bidi text improved a
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971239#comment-14971239
]
Maruan Sahyoun commented on PDFBOX-2252:
>From a quick scan the results for bidi text improved a
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970772#comment-14970772
]
Tim Allison commented on PDFBOX-2252:
-
Ha! I still feel guilty about this one so I'll finish this
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970116#comment-14970116
]
Tim Allison commented on PDFBOX-2252:
-
[~tilman], thank you for the ping. I've just re-kicked off
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970456#comment-14970456
]
Tilman Hausherr commented on PDFBOX-2252:
-
Uhm - the ping was in another issue, related to the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945490#comment-14945490
]
Maruan Sahyoun commented on PDFBOX-2252:
The effort you put into that is of great help. No need
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944575#comment-14944575
]
Maruan Sahyoun commented on PDFBOX-2252:
[~tilman] [~talli...@apache.org] Thanks for the samples.
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945375#comment-14945375
]
Tim Allison commented on PDFBOX-2252:
-
Sounds good to me. Given current workload on other stuff, I
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943765#comment-14943765
]
Tilman Hausherr commented on PDFBOX-2252:
-
Here's an excellent test for mixed text extraction,
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942341#comment-14942341
]
Tim Allison commented on PDFBOX-2252:
-
Y, I agree that 1.x vs 2.x would yield too broad of a
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942255#comment-14942255
]
Tilman Hausherr commented on PDFBOX-2252:
-
No, that would be a separate thing, related to the 2.0
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942253#comment-14942253
]
Tim Allison commented on PDFBOX-2252:
-
Y. Should I compare 2.0 trunk to 1.8.10 perhaps?
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940796#comment-14940796
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1706348 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940815#comment-14940815
]
Maruan Sahyoun commented on PDFBOX-2252:
[~tilman] could you give it another try?
>
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940950#comment-14940950
]
Maruan Sahyoun commented on PDFBOX-2252:
[~tilman] could you check with Tim if he's able to run
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940948#comment-14940948
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1706365 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941401#comment-14941401
]
Tilman Hausherr commented on PDFBOX-2252:
-
[~talli...@apache.org] are you able to run some text
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940868#comment-14940868
]
Tilman Hausherr commented on PDFBOX-2252:
-
Now it works, thanks!
> PDFTextStripper has problem
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934883#comment-14934883
]
Tilman Hausherr commented on PDFBOX-2252:
-
Seems that this happens only on windows, as nobody
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933564#comment-14933564
]
Maruan Sahyoun commented on PDFBOX-2252:
lapdftext is GPLed software - we can't reuse the code.
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910107#comment-14910107
]
Andreas Meier commented on PDFBOX-2252:
---
I tested the latest patch with the documents.
There are
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910177#comment-14910177
]
Maruan Sahyoun commented on PDFBOX-2252:
thx - but the credit goes to you. I was only enhancing
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910171#comment-14910171
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1705611 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933171#comment-14933171
]
Maruan Sahyoun commented on PDFBOX-2252:
please find enclosed some articles:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910194#comment-14910194
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1705619 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933147#comment-14933147
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1705654 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910200#comment-14910200
]
Maruan Sahyoun commented on PDFBOX-2252:
[~lehmi] could you take a look at
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933085#comment-14933085
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1705636 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933140#comment-14933140
]
Andreas Lehmkühler commented on PDFBOX-2252:
IMHO the license itself is ok. Please add a note
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933076#comment-14933076
]
Andreas Meier commented on PDFBOX-2252:
---
Most of the files I see do not provide any article
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933115#comment-14933115
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1705645 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933277#comment-14933277
]
Andreas Meier commented on PDFBOX-2252:
---
I have already studied one of the two papers you posted
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909835#comment-14909835
]
Maruan Sahyoun commented on PDFBOX-2252:
[~AndreasMeier] maybe you can test the latest patch with
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909771#comment-14909771
]
Maruan Sahyoun commented on PDFBOX-2252:
I've tested the patch with the string [~jahewson]
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909830#comment-14909830
]
Maruan Sahyoun commented on PDFBOX-2252:
I'll add a new patch. With that the result of the text
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908107#comment-14908107
]
Maruan Sahyoun commented on PDFBOX-2252:
I applied the patch in my working copy and it's breaking
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908359#comment-14908359
]
Tilman Hausherr commented on PDFBOX-2252:
-
Page 52 and 56 of bugzilla867751.pdf might help. I'll
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907997#comment-14907997
]
Maruan Sahyoun commented on PDFBOX-2252:
[~tilman] [~talli...@apache.org] would you have some
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908004#comment-14908004
]
Tim Allison commented on PDFBOX-2252:
-
I'll take a look, probably not in govdocs1, but might be in
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908016#comment-14908016
]
Maruan Sahyoun commented on PDFBOX-2252:
thanks in advance - no need to rush - wanted to make
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906244#comment-14906244
]
Andreas Meier commented on PDFBOX-2252:
---
In fact I don't know if there are some build-in classes
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905928#comment-14905928
]
Tilman Hausherr commented on PDFBOX-2252:
-
[~msahyoun] could you reformat and commit the existing
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905954#comment-14905954
]
Maruan Sahyoun commented on PDFBOX-2252:
I'll do
> PDFTextStripper has problem with documents
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905967#comment-14905967
]
Maruan Sahyoun commented on PDFBOX-2252:
[~AndreasMeier] I quickly looked through the patch and
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905973#comment-14905973
]
ASF subversion and git services commented on PDFBOX-2252:
-
Commit 1705010 from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906004#comment-14906004
]
Tilman Hausherr commented on PDFBOX-2252:
-
Only 2.0.
> PDFTextStripper has problem with
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905915#comment-14905915
]
Andreas Meier commented on PDFBOX-2252:
---
Even if the patch doesn't look like it, I had a hard time
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906440#comment-14906440
]
Maruan Sahyoun commented on PDFBOX-2252:
[~AndreasMeier] sorry for all the question given the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906851#comment-14906851
]
Maruan Sahyoun commented on PDFBOX-2252:
[~AndreasMeier] I uploaded a text file for you where the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904787#comment-14904787
]
Tilman Hausherr commented on PDFBOX-2252:
-
The problem is that some old code doesn't follow the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904040#comment-14904040
]
Andreas Meier commented on PDFBOX-2252:
---
I want provide a new patch to address this Problem,
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636008#comment-14636008
]
John Hewson commented on PDFBOX-2252:
-
{quote}
We can't rely on the person who
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634582#comment-14634582
]
Andreas Meier commented on PDFBOX-2252:
---
The (( occurs, becaus Adobe Reader notices
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634576#comment-14634576
]
Tilman Hausherr commented on PDFBOX-2252:
-
Yeah, I'm confused: when copying from
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634577#comment-14634577
]
Andreas Meier commented on PDFBOX-2252:
---
Yes, numbers are written ltr
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633613#comment-14633613
]
Andreas Meier commented on PDFBOX-2252:
---
I created a small test: atest.pdf
By the
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634121#comment-14634121
]
John Hewson commented on PDFBOX-2252:
-
How did you create atest.pdf? The Arabic words
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632570#comment-14632570
]
John Hewson commented on PDFBOX-2252:
-
Ha, yes, though I wasn't suggesting we start
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630834#comment-14630834
]
Andreas Meier commented on PDFBOX-2252:
---
After some investigation I can say:
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631622#comment-14631622
]
Tilman Hausherr commented on PDFBOX-2252:
-
We need a google search that contains
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631704#comment-14631704
]
John Hewson commented on PDFBOX-2252:
-
We could also generate some test case with
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631705#comment-14631705
]
John Hewson commented on PDFBOX-2252:
-
We could also generate some test cases with
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631791#comment-14631791
]
Tilman Hausherr commented on PDFBOX-2252:
-
Yes, but it is risky to create
[
https://issues.apache.org/jira/browse/PDFBOX-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630600#comment-14630600
]
John Hewson commented on PDFBOX-2252:
-
w.r.t to the large number of LTR and RTL
70 matches
Mail list logo