[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828548#action_12828548 ] Julien Nioche commented on NUTCH-781: - did you forgot to update conf/tika-mimetypes.xml ? indeed - well spotted, thanks Related question: do we actually need our own version on the tika config anymore? I saw there were some old issues that were fixed in the custom version but i would quess those changes, if important, have already made their way into Tika? the version we had was the same as the one provided by Tika 0.4 so I suppose we could safely rely on theTika defaults. MimeUtil currently requires needs tika-mimetypes.xml to be in the available in the classpath but we could modify that so that it uses the default version from the tika jar if nothing can be found in conf. Let's put that in a separate JIRA issue if we really want it, in the meantime I'll commit the v 0.6 of tika-mimetypes.xml J. Update Tika to v0.6 for the MimeType detection --- Key: NUTCH-781 URL: https://issues.apache.org/jira/browse/NUTCH-781 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 [from annoucement] Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 0.6 contains a number of improvements and bug fixes. Details can be found in the changes file: http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828561#action_12828561 ] Sami Siren commented on NUTCH-781: -- {quote} the version we had was the same as the one provided by Tika 0.4 so I suppose we could safely rely on theTika defaults. MimeUtil currently requires needs tika-mimetypes.xml to be in the available in the classpath but we could modify that so that it uses the default version from the tika jar if nothing can be found in conf. Let's put that in a separate JIRA issue if we really want it, in the meantime I'll commit the v 0.6 of tika-mimetypes.xml {quote} ok. thanks. Update Tika to v0.6 for the MimeType detection --- Key: NUTCH-781 URL: https://issues.apache.org/jira/browse/NUTCH-781 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 [from annoucement] Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 0.6 contains a number of improvements and bug fixes. Details can be found in the changes file: http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828968#action_12828968 ] Hudson commented on NUTCH-781: -- Integrated in Nutch-trunk #1059 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1059/]) : updated tika-mimetypes.xml Update Tika to v0.6 for the MimeType detection --- Key: NUTCH-781 URL: https://issues.apache.org/jira/browse/NUTCH-781 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 [from annoucement] Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 0.6 contains a number of improvements and bug fixes. Details can be found in the changes file: http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828275#action_12828275 ] Sami Siren commented on NUTCH-781: -- did you forgot to update conf/tika-mimetypes.xml ? Related question: do we actually need our own version on the tika config anymore? I saw there were some old issues that were fixed in the custom version but i would quess those changes, if important, have already made their way into Tika? Update Tika to v0.6 for the MimeType detection --- Key: NUTCH-781 URL: https://issues.apache.org/jira/browse/NUTCH-781 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 [from annoucement] Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 0.6 contains a number of improvements and bug fixes. Details can be found in the changes file: http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection
[ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828458#action_12828458 ] Hudson commented on NUTCH-781: -- Integrated in Nutch-trunk #1058 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1058/]) : upgrade tika to version 0.6 : upgrade tika to version 0.6 Update Tika to v0.6 for the MimeType detection --- Key: NUTCH-781 URL: https://issues.apache.org/jira/browse/NUTCH-781 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 [from annoucement] Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 0.6 contains a number of improvements and bug fixes. Details can be found in the changes file: http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.