[jira] [Created] (TIKA-1138) I got empty body and empty title with some documents

2013-06-24 Thread Koutsoulis Philippe (JIRA)
Koutsoulis Philippe created TIKA-1138: - Summary: I got empty body and empty title with some documents Key: TIKA-1138 URL: https://issues.apache.org/jira/browse/TIKA-1138 Project: Tika

[jira] [Commented] (TIKA-1138) I got empty body and empty title with some documents

2013-06-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691929#comment-13691929 ] Nick Burch commented on TIKA-1138: -- That's often a sign that the parser can't handle them.

Your Gump Build(s)

2013-06-24 Thread Stefan Bodewig
Dear Community Apache Gump builds some of your projects and it is quite possible you don't know or have by now forgotten about it. More than half a year ago a technical problem has forced us to turn off emails on build failures as we would have been sending out lots of false alarms. Before we

[jira] [Updated] (TIKA-1138) Empty body and empty title with some XLS and TXT documents

2013-06-24 Thread Koutsoulis Philippe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koutsoulis Philippe updated TIKA-1138: -- Description: *No error in logs* *+Extract from my Structured Text:+* {noformat} ?xml

[jira] [Commented] (TIKA-1138) Empty body and empty title with some XLS and TXT documents

2013-06-24 Thread Koutsoulis Philippe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692001#comment-13692001 ] Koutsoulis Philippe commented on TIKA-1138: --- Upated _Informations_ section and

[jira] [Updated] (TIKA-1138) Empty body and empty title with some XLS and TXT documents

2013-06-24 Thread Koutsoulis Philippe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koutsoulis Philippe updated TIKA-1138: -- Component/s: (was: general) parser Labels: (was: test)

Re: MP4Parser Triggers no ContentHandler.startDocument() and ContentHandler.endDocument() in one case

2013-06-24 Thread Nick Burch
On Wed, 29 May 2013, Nick Burch wrote: I'm not sure if we do have a properly documented policy on what a parser should do if it receives a file it can't handle. For ones that are invalid (eg corrupt), I believe an exception is the expected result. The case when the file seems valid, but can't

[jira] [Commented] (TIKA-1138) Empty body and empty title with some XLS and TXT documents

2013-06-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692044#comment-13692044 ] Nick Burch commented on TIKA-1138: -- I've just tried another excel file of that list, it's

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-24 Thread Daniel Gibby (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692049#comment-13692049 ] Daniel Gibby commented on TIKA-1130: Looks like the POI bug

[jira] [Comment Edited] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-24 Thread Daniel Gibby (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692049#comment-13692049 ] Daniel Gibby edited comment on TIKA-1130 at 6/24/13 2:47 PM: -

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692058#comment-13692058 ] Nick Burch commented on TIKA-1130: -- Do a svn checkout of POI, run ant jar to build the

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692110#comment-13692110 ] Tim Allison commented on TIKA-1130: --- Nick, I think I have to make modifications to

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692203#comment-13692203 ] Nick Burch commented on TIKA-1130: -- Tim - you'll want to checkout POI from SVN, do ant jar

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692517#comment-13692517 ] Tim Allison commented on TIKA-1130: --- Maven proxy setting in my settings.xml file is