[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-02 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828548#action_12828548
 ] 

Julien Nioche commented on NUTCH-781:
-

 did you forgot to update conf/tika-mimetypes.xml ?
indeed - well spotted, thanks

 Related question: do we actually need our own version on the tika config 
 anymore? I saw there were some old issues that were fixed in the custom 
 version but i would quess those changes, if important, have already made 
 their way into Tika?
the version we had was the same as the one provided by Tika 0.4 so I suppose we 
could safely rely on theTika defaults. MimeUtil currently requires needs 
tika-mimetypes.xml to be in the available in the classpath but we could modify 
that so that it uses the default version from the tika jar if nothing can be 
found in conf. Let's put that in a separate JIRA issue if we really want it, in 
the meantime I'll commit the v 0.6 of tika-mimetypes.xml

J.


 Update Tika to v0.6  for the MimeType detection
 ---

 Key: NUTCH-781
 URL: https://issues.apache.org/jira/browse/NUTCH-781
 Project: Nutch
  Issue Type: Improvement
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1


 [from annoucement]
 Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
 extracting metadata and structured text content from various documents using
 existing parser libraries.
 Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
 be found in the changes file:
 http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-02 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828561#action_12828561
 ] 

Sami Siren commented on NUTCH-781:
--

{quote}
the version we had was the same as the one provided by Tika 0.4 so I suppose we 
could safely rely on theTika defaults. MimeUtil currently requires needs 
tika-mimetypes.xml to be in the available in the classpath but we could modify 
that so that it uses the default version from the tika jar if nothing can be 
found in conf. Let's put that in a separate JIRA issue if we really want it, in 
the meantime I'll commit the v 0.6 of tika-mimetypes.xml
{quote}

ok. thanks.

 Update Tika to v0.6  for the MimeType detection
 ---

 Key: NUTCH-781
 URL: https://issues.apache.org/jira/browse/NUTCH-781
 Project: Nutch
  Issue Type: Improvement
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1


 [from annoucement]
 Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
 extracting metadata and structured text content from various documents using
 existing parser libraries.
 Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
 be found in the changes file:
 http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828968#action_12828968
 ] 

Hudson commented on NUTCH-781:
--

Integrated in Nutch-trunk #1059 (See 
[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1059/])
 : updated tika-mimetypes.xml


 Update Tika to v0.6  for the MimeType detection
 ---

 Key: NUTCH-781
 URL: https://issues.apache.org/jira/browse/NUTCH-781
 Project: Nutch
  Issue Type: Improvement
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1


 [from annoucement]
 Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
 extracting metadata and structured text content from various documents using
 existing parser libraries.
 Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
 be found in the changes file:
 http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-01 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828275#action_12828275
 ] 

Sami Siren commented on NUTCH-781:
--

did you forgot to update conf/tika-mimetypes.xml ?

Related question: do we actually need our own version on the tika config 
anymore? I saw there were some old issues that were fixed in the custom version 
but i would quess those changes, if important, have already made their way into 
Tika?



 Update Tika to v0.6  for the MimeType detection
 ---

 Key: NUTCH-781
 URL: https://issues.apache.org/jira/browse/NUTCH-781
 Project: Nutch
  Issue Type: Improvement
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1


 [from annoucement]
 Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
 extracting metadata and structured text content from various documents using
 existing parser libraries.
 Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
 be found in the changes file:
 http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection

2010-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828458#action_12828458
 ] 

Hudson commented on NUTCH-781:
--

Integrated in Nutch-trunk #1058 (See 
[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1058/])
: upgrade tika to version 0.6
: upgrade tika to version 0.6


 Update Tika to v0.6  for the MimeType detection
 ---

 Key: NUTCH-781
 URL: https://issues.apache.org/jira/browse/NUTCH-781
 Project: Nutch
  Issue Type: Improvement
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1


 [from annoucement]
 Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
 extracting metadata and structured text content from various documents using
 existing parser libraries.
 Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
 be found in the changes file:
 http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.