[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-21 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178547#comment-14178547
 ] 

Tilman Hausherr commented on PDFBOX-2250:
-

ignore the last commit message (wrong issue)

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 1.8.8, 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177084#comment-14177084
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1633185 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1633185 ]

PDFBOX-2250: replaced the 200 bytes seek repair mechanism with a brute force 
search, optimized the xref repair mechanism, lower the minimum start offset

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177103#comment-14177103
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1633186 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1633186 ]

PDFBOX-2250: skip empty xref table followed by trailer, leave call that will 
create empty instead of null curXrefTrailerObj when xref table is empty (merged 
from trunk)

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177115#comment-14177115
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1633187 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1633187 ]

PDFBOX-2250: include key for Invalid object stream xref object reference 
IOException, reat fileOffset == 0 like fileOffset == null (merged from trunk)

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176290#comment-14176290
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


[~tilman] Thanks for the fast feedback. I already have a fix for the first 
issue (055794.pdf). I assumed that the relevant data of a valid pdf starts at 
offset 15, but in your case it already starts at offset 10. I'll have a look at 
the other one too.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176293#comment-14176293
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1632895 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1632895 ]

PDFBOX-2250: optimized the xref repair mechanism, lower the minimum start offset

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176295#comment-14176295
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


I've improved the xref repair mechanism and fixed the issue with the not found 
object mentioned by Tilman (055794.pdf).

[~tilman] I had a look at 113223.pdf, but I can't find anby object which can be 
found. Can you give me a pointer where to look?

*TODO*
- the xref repair algorithm simply searches for the nearest offset, which may 
fail if more than one xref table is present

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-10-19 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176298#comment-14176298
 ] 

Tilman Hausherr commented on PDFBOX-2250:
-

In the version before the last changes, I got this error:

SCHWERWIEGEND: Can't find the object xref at offset 25554
Exception in thread main java.io.IOException: Error: Expected a long type at 
offset 25554, instead got '?Tl*OjlP^d8jp1Y#@J+\)cfaMC\?Y+WgkWs.4@'

Now it works without any error or warning and renders perfectly. The xref 
offset is wrong, and every object from the xref is wrong, e.g. the first one is 
said to be at 16 but it really at 17. The next one is said to be at 69 but is 
really at 71. etc etc

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: 055794.pdf, 113223.pdf, 
 PDFBOX-2250-107425-empty-xref.pdf, PDFBOX-2250-110264-xref-zeronumber.pdf, 
 PDFBOX-2250-229205.pdf, PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-09-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131655#comment-14131655
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1624567 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1624567 ]

PDFBOX-2250: leave call that will create empty instead of null 
curXrefTrailerObj when xref table is empty

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf, PDFBOX-2250-229205.pdf, 
 PDFBOX-2250-233566.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-25 Thread Timo Boehme (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109074#comment-14109074
 ] 

Timo Boehme commented on PDFBOX-2250:
-

[~tilman] your patch will treat an object from the xref table with offset 0 
equally to a non-declared object id - a null object. While on the one hand
for a (C) programmer this might be obvious (0==null) this is not part of the 
PDF specification (at least I couldn't found it). Furthermore with the ongoing
xref offset healing project we try to find the correct offset for an object id 
- this would be prevented with treating it like a null object only because
it happens to have offset 0 and not another wrong offset. Thus I'm not sure if 
this patch is a good idea. On the other hand offset 0 will be wrong in any case
(assuming the required header) and quite seldom thus if we have a document 
which can be cured by this patch it might be ok.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-25 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109295#comment-14109295
 ] 

John Hewson commented on PDFBOX-2250:
-

[~lehmi], the spec says that, but the[ Adobe Supplement to the ISO 
32000|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf,
 which describes Acrobat's behaviour, adds that:

{quote}
Acrobat viewers require only that the header appear somewhere within the first 
1024 bytes of the file. 
{quote}

Such PDFs are not unheard of, PDFParser#parseHeader() contains code which scans 
the PDF for the header (although it doesn't limit to 1024 bytes and contains 
some misleading comments).

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109320#comment-14109320
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


[~jahewson] I guess that advanced feature was added to skip rubbish in front 
of the file header and/or malformed header but it wasn't meant to be used to 
allow constructs like the one you've mentioned above.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-25 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109343#comment-14109343
 ] 

John Hewson commented on PDFBOX-2250:
-

Allowing rubbish in front of the file header is _almost_ the same construct, 
as even in malformed documents the header is usually at the beginning of a 
line. The end result is the same though: we do allow an object at offset 0.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107959#comment-14107959
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


No, according to the spec a pdf shall start with the pdf version
{quote}
7.5.2 File Header
The first line of a PDF file shall be a header consisting of the 5 characters 
%PDF – followed by a version number of the form 1.N, where N is a digit between 
0 and 7.
{quote}


 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106686#comment-14106686
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


IMHO it's ok to treat a 0 offset as null offset as such a value is invalid 
and thus doesn't make any sense. 

In the mentioned pdf the 0-offset belongs to the object 3 0 which is 
referenced in the catalog but doesn't exists in the pdf

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-22 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107365#comment-14107365
 ] 

John Hewson commented on PDFBOX-2250:
-

Wouldn't the following be a legal start of a PDF file, given that the header 
need only be in the first 1024 bytes? In this case, object 1 0 would be at 
offset 0.

{code}
1 0 obj
/Count 1
/Type /Pages
/Kids [5 0 R]

endobj
%PDF-1.4
2 0 obj
/Pages 1 0 R
/Type /Catalog

endobj
etc...
{code}

But maybe we shouldn't support such silliness.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf, 
 PDFBOX-2250-110264-xref-zeronumber.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105105#comment-14105105
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1619296 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1619296 ]

PDFBOX-2250: include key for Invalid object stream xref object reference 
IOException

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Attachments: PDFBOX-2250-107425-empty-xref.pdf


 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104711#comment-14104711
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1619255 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1619255 ]

PDFBOX-2250: skip empty xref table followed by trailer

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094303#comment-14094303
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1617528 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1617528 ]

PDFBOX-2250: don't override offset/trailer when parsing the cross reference 
stream of a hybrid xref table

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094317#comment-14094317
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


All mentioned pdfs are working now.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-12 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094515#comment-14094515
 ] 

Tilman Hausherr commented on PDFBOX-2250:
-

Yes!

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093086#comment-14093086
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1617339 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1617339 ]

PDFBOX-2250: parse an optional cross-reference stream to get object numbers for 
compressed objects as well

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093088#comment-14093088
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1617340 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1617340 ]

PDFBOX-2250: parse an optional cross-reference stream to get object numbers for 
compressed objects as well

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093106#comment-14093106
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


In contrast to the old parser the non-sequential one didn't parse 
cross-reference streams. I've added that feature so that especially object 
references for compressed objects could be found now.

This should improve the parser once more if it comes to pdfs using object 
streams. I've used this [sample 
pdf|http://bewerbung.fh-kaernten.at/fileadmin/Anleitung-PDF-erstellen.pdf]  
provided by Martin Tappler on dev@pdfboxf

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread Thomas Chojecki (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093152#comment-14093152
 ] 

Thomas Chojecki commented on PDFBOX-2250:
-

[~lehmi]
All fixes and improvements are targeting the non-sequential parser and I won't 
port those changes to the old parser.

The old parser already has this feature or similar one as I remember. This was 
needed as fix for a third party lib that creates documents that have a miss 
matched offset by 2 or 3 bytes. You can find it in the PDFParser class line 923 
(resolveConflicts). 
https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L923

I don't have read the whole coversation, but you wrote something of 200 bytes 
self healing range. This can cause problems with pdfs that are broken and 
include pdf documents as file attachment. The flatdecode algorithm sometimes 
does not compress each block, so it will leave some plaintext pdf blocks whick 
can contain parts like endstream or endobj. In this case it can happen that 
the self healing algorithm runs into such an uncompressed block and fail 
reading the object.

I hope you understand what I mean :-) 

PS: some offtopic things. I think the signature implementation only work with 
the old parser. So maybe someone can post this info on the website if the 
default parser implementation change.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093186#comment-14093186
 ] 

Tilman Hausherr commented on PDFBOX-2250:
-

How/where can I see the difference between before and after for the sample PDF? 
Should the rendering be different, or something else?

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093187#comment-14093187
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


[~tilman] I should have be more specific. Use PDFDebugger and have a look at 
the root node. There isn't any StructTreeRoot when using the non sequential 
parser without my changes

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093191#comment-14093191
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


[~tchojecki] I'm aware of that feature. But that was more a brute force 
search. The non sequential parser is following the spec a relies on the 
xref-table itself. This issues targets some of the known problems such as wrong 
offsets. The bugfix/improvement for compress objects was just a sideproduct as 
I stumbeld upon that missing part.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093197#comment-14093197
 ] 

Tilman Hausherr commented on PDFBOX-2250:
-

These files can no longer be opened with the nonsequential parser:
- PDFBOX-1577 (Missing root object specification in trailer)
- PDFBOX-1756 (Catalog cannot be found)
- PDFBOX-2251 (Missing root object specification in trailer)
- PDFBOX-1512 (immo-kurier_arsenal_93x62.pdf) Missing root object 
specification in trailer



 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093248#comment-14093248
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


Thanks for the fast feedback, I'll have a look

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved. I'm planing to solve at least some of them.
 All fixes and improvements are targeting the non-sequential parser and I 
 won't port those changes to the old parser.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086481#comment-14086481
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1615956 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1615956 ]

PDFBOX-2250: removed false warn message, 0 as count value is allowed

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086483#comment-14086483
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1615957 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1615957 ]

PDFBOX-2250: removed false warn message, 0 as count value is allowed

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14086489#comment-14086489
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


Some of the files (e.g. testComment.pdf) Tim mentioned on the mailing list 
produced the following warn message 
{code}
Count in xref table is 0 at offset 68229
{code}
This was a false warning as 0 is allowd as count value within a xref table.
I've removed that message and now Tims pdfs don't produce any of the given warn 
messages anymore. 

I'm continuing with those files Tilman mentioned.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082335#comment-14082335
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


I've started with a small set of [sample 
pdfs|http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/]
 provided by TIKA (thanks to [~talli...@mitre.org] for the pointer)



 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082338#comment-14082338
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1615139 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1615139 ]

PDFBOX-2250: detect XRef streams to avoid false positives when searching for 
the XRef table/stream

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082343#comment-14082343
 ] 

ASF subversion and git services commented on PDFBOX-2250:
-

Commit 1615141 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1615141 ]

PDFBOX-2250: detect XRef streams to avoid false positives when searching for 
the XRef table/stream

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2250) Improve XRef self healing mechanism

2014-08-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082350#comment-14082350
 ] 

Andreas Lehmkühler commented on PDFBOX-2250:


The former implementation wasn't able to detect XRef streams which led to false 
positives (Can't find the object xref at offset, e.g. in the TIKA pdf 
testPDF_childAttachments.pdf). 
After adding a detector ithose messages vanished.

 Improve XRef self healing mechanism
 ---

 Key: PDFBOX-2250
 URL: https://issues.apache.org/jira/browse/PDFBOX-2250
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 2.0.0
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler

 PDFBOX-1769 introduced a self healing mechanism to repair corrupt XRef 
 offsets. But that one was just a starter and there remain a lot of issues to 
 be solved.
 I'm planing to solve at least some of them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)