[jira] [Resolved] (TIKA-471) Avoid Charset name bottleneck when multiple threads are using HtmlParser

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-471. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting (was: Ken Krugler) As

[jira] [Resolved] (TIKA-502) Add programming language mime-types

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-502. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting (was: Ken Krugler)

[jira] [Resolved] (TIKA-458) Specify HTMLHandler via Context

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-458. Resolution: Won't Fix Resolving as Won't Fix due to lack of activity and the fact that the existing

[jira] [Resolved] (TIKA-430) Automatically let all valid XHTML 1.0 attributes through from HTML documents

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-430. Resolution: Incomplete Resolving as incomplete since after two years there still isn't a patch for

[jira] [Resolved] (TIKA-242) Incremental configuration AutoDetectParser

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-242. Resolution: Duplicate Resolving as a duplicate of the auto-loading mechanisms we added for detectors

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409002#comment-13409002 ] Michael McCandless commented on TIKA-948: - Thanks for taking this Nick! Can you add

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-08 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409024#comment-13409024 ] Nick Burch commented on TIKA-948: - If someone feels keen, we could add CompObj decoding.

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2012-07-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409028#comment-13409028 ] Luis Filipe Nassif commented on TIKA-456: - Does ForkParser use a kind of timeout

[jira] [Resolved] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-431. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting (was: Ken Krugler) In

[jira] [Commented] (TIKA-906) Headers, footers, and footnotes not extracted from Pages documents

2012-07-08 Thread Dave Meikle (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409095#comment-13409095 ] Dave Meikle commented on TIKA-906: -- Support for AutoPageNumbers added in r1358856.

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409094#comment-13409094 ] Michael McCandless commented on TIKA-948: - bq. However, it doesn't look like it

[jira] [Updated] (TIKA-906) Headers, footers, and footnotes not extracted from Pages documents

2012-07-08 Thread Dave Meikle (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Meikle updated TIKA-906: - Fix Version/s: (was: 1.3) 1.2 Headers, footers, and footnotes not extracted

FYI: text/plain and text/html media types now come with charset info

2012-07-08 Thread Jukka Zitting
Hi, As of revision 1358858 Tika returns the detected character encoding as a part of the content type metadata field. For example, instead of text/plain the returned content type will be text/plain; charset=UTF-8 for a UTF-8 encoded text document. This is conceptually correct (see TIKA-431), but

[jira] [Resolved] (TIKA-892) Tika does not use the HTML5 meta charset tag when determining charset

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-892. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting (was: Ken Krugler)

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409101#comment-13409101 ] Jukka Zitting commented on TIKA-456: bq. Does ForkParser use a kind of timeout

[jira] [Commented] (TIKA-885) Possible ConcurrentModificationException while accessing Metadata produced by ParsingReader

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409102#comment-13409102 ] Jukka Zitting commented on TIKA-885: Hmm, that is a good point! I guess the best way to

[jira] [Resolved] (TIKA-815) Tika parsers should handle failures more gracefully

2012-07-08 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-815. Resolution: Duplicate Resolving this as a duplicate of all the followup issues mentioned above.