[jira] [Commented] (PDFBOX-1977) LZWFilter fails / TestFilters is non-determinate

2014-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931990#comment-13931990 ] Michael McCandless commented on PDFBOX-1977: It's intentional that the test

[jira] [Commented] (PDFBOX-1977) LZWFilter fails / TestFilters is non-determinate

2014-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932018#comment-13932018 ] Michael McCandless commented on PDFBOX-1977: I think just leave the random

[jira] [Commented] (PDFBOX-1977) LZWFilter fails / TestFilters is non-determinate

2014-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932046#comment-13932046 ] Michael McCandless commented on PDFBOX-1977: [~tilman] I'm confused: the test

[jira] [Commented] (PDFBOX-1273) java.io.IOException: Error: Unknown annotation type null

2013-03-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615797#comment-13615797 ] Michael McCandless commented on PDFBOX-1273: Looks like this is the same

[jira] [Commented] (PDFBOX-1320) NPE in extractEmbeddedDocuments

2012-05-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281500#comment-13281500 ] Michael McCandless commented on PDFBOX-1320: Good catch Sumuli! We can also

[jira] [Updated] (PDFBOX-1320) NPE in extractEmbeddedDocuments

2012-05-23 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1320: --- Attachment: PDFBOX-1320.patch I committed the fix to Tika's PDFParser. Here's a

[jira] [Commented] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-05-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280121#comment-13280121 ] Michael McCandless commented on PDFBOX-1299: Hi, any feedback on my patch

[jira] [Commented] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-05-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280238#comment-13280238 ] Michael McCandless commented on PDFBOX-1299: Thanks Timo!

[jira] [Commented] (PDFBOX-1305) Text extraction takes huge amount of time on some files

2012-05-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273931#comment-13273931 ] Michael McCandless commented on PDFBOX-1305: I just tested this on PDFBox's

[jira] [Created] (PDFBOX-1303) Tika's PDFParser fails to parse documents embedded in a PDF Package

2012-05-05 Thread Michael McCandless (JIRA)
Michael McCandless created PDFBOX-1303: -- Summary: Tika's PDFParser fails to parse documents embedded in a PDF Package Key: PDFBOX-1303 URL: https://issues.apache.org/jira/browse/PDFBOX-1303

[jira] [Updated] (PDFBOX-1303) Tika's PDFParser fails to parse documents embedded in a PDF Package

2012-05-05 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1303: --- Attachment: testPDFPackage.pdf PDFBOX-1303.patch Patch w/ test

[jira] [Updated] (PDFBOX-1297) ExtractText fails to extract text from packaged PDFs

2012-05-04 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1297: --- Attachment: testPDFPackage.pdf PDFBOX-1297.patch Patch, adding

[jira] [Created] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-04-29 Thread Michael McCandless (JIRA)
Michael McCandless created PDFBOX-1299: -- Summary: BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs Key: PDFBOX-1299 URL: https://issues.apache.org/jira/browse/PDFBOX-1299

[jira] [Updated] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-04-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1299: --- Attachment: TX0819_2009-07-27_Windstream-TCG_Agreement.pdf Test PDF showing the

[jira] [Updated] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-04-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1299: --- Attachment: PDFBOX-1299.patch Attached patch that at least fixes this one document.

[jira] [Updated] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-04-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1299: --- Attachment: (was: TX0819_2009-07-27_Windstream-TCG_Agreement.pdf)

[jira] [Updated] (PDFBOX-1299) BaseParser.readUntilEndOfStream can stop too early, causing IOException on valid PDFs

2012-04-29 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1299: --- Attachment: Tracey_Prather_31-Dec-2010_211843_2011Portfolio.pdf Sorry, wrong

[jira] [Created] (PDFBOX-1297) ExtractText fails to extract text from packaged PDFs

2012-04-26 Thread Michael McCandless (JIRA)
Michael McCandless created PDFBOX-1297: -- Summary: ExtractText fails to extract text from packaged PDFs Key: PDFBOX-1297 URL: https://issues.apache.org/jira/browse/PDFBOX-1297 Project: PDFBox

[jira] [Updated] (PDFBOX-1297) ExtractText fails to extract text from packaged PDFs

2012-04-26 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated PDFBOX-1297: --- Attachment: PDFPackage.pdf Example document showing that ExtractText doesn't