[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337814#comment-14337814 ] Chris A. Mattmann commented on NUTCH-1925: -- Great work! Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.p2.v2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332350#comment-14332350 ] Hudson commented on NUTCH-1925: --- SUCCESS: Integrated in Nutch-nutchgora #1347 (See [https://builds.apache.org/job/Nutch-nutchgora/1347/]) NUTCH-1925 Upgrade Tika to version 1.7 (lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1661539) * /nutch/branches/2.x/CHANGES.txt * /nutch/branches/2.x/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaConfig.java * /nutch/branches/2.x/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaParser.java * /nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/DOMContentUtilsTest.java Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.p2.v2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332384#comment-14332384 ] Sebastian Nagel commented on NUTCH-1925: Great to see again successful Jenkins builds. Thanks! Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.p2.v2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328194#comment-14328194 ] Tyler Palsulich commented on NUTCH-1925: Thanks [~wastl-nagel]. Looking into it more, org.apache.nutch.parse.tika.TikaConfig was deleted on the 1.x branch in NUTCH-1234 (see [this commit|https://github.com/apache/nutch/commit/7f44cdc998117eacc04609008fdac4ce1e2bb387#diff-a883bfa38ab4c09e2ee777564297367e]) in favor of org.apache.tika.config.TikaConfig. But, the same change was never done on the 2.x branch. I can supply a patch that does it, but it will require some API changes. That should fix the discrepancy we're seeing between 1.x and 2.x in this issue. Thoughts? Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328218#comment-14328218 ] Markus Jelsma commented on NUTCH-1925: -- Seems fine to me, if it passes the test. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328291#comment-14328291 ] Sebastian Nagel commented on NUTCH-1925: Sounds good. Opened NUTCH-1945 to add a unit test for Xlsx files. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325756#comment-14325756 ] Sebastian Nagel commented on NUTCH-1925: Hi [~tpalsulich], the patch breaks the parsing of XLSX files (src/testresources/test-mime-util/test.xlsx, cf. NUTCH-1605): the parser needs the additional hints from the URL (file name) and the content type sent in the HTTP response header. Also it's good to keep plugins the same (as much as possible) between trunk and 2.x. Needs further investigation what's going wrong, a unit test for the xlsx parser would be nice to have. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319064#comment-14319064 ] Markus Jelsma commented on NUTCH-1925: -- Ja, ill check it in tomorrow. Any comments on other minor issues on 1.10 before we decide an RC? Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317815#comment-14317815 ] Markus Jelsma commented on NUTCH-1925: -- Committed to trunk in revision 1659168. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317186#comment-14317186 ] Markus Jelsma commented on NUTCH-1925: -- ill check it out and check it in tomorrow. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317132#comment-14317132 ] Lewis John McGibbney commented on NUTCH-1925: - Any objection to commit folks? Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317187#comment-14317187 ] Markus Jelsma commented on NUTCH-1925: -- Ill check it out, and check it in tomorrow -Original message- Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1426#comment-1426 ] Markus Jelsma commented on NUTCH-1925: -- Tyler, can you attempt to provide a patch for the upgrade. There are, indeed, no API changes so following Lewis' guide should work out just fine. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Priority: Blocker Fix For: 2.4, 1.10 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300060#comment-14300060 ] Tyler Palsulich commented on NUTCH-1925: I'd be happy to, [~markus17]! But, I think I'm running into (I think) NUTCH-1925 right now. So, I don't want to add a patch, yet. Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Assignee: Markus Jelsma Priority: Blocker Fix For: 1.10, 2.3.1 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295367#comment-14295367 ] Lewis John McGibbney commented on NUTCH-1925: - Please also see [here|https://github.com/apache/nutch/blob/trunk/src/plugin/parse-tika/howto_upgrade_tika.txt] Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295366#comment-14295366 ] Lewis John McGibbney commented on NUTCH-1925: - +1 [~tpalsulich] Upgrade Tika to version 1.7 --- Key: NUTCH-1925 URL: https://issues.apache.org/jira/browse/NUTCH-1925 Project: Nutch Issue Type: Improvement Components: build Reporter: Tyler Palsulich Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant API changes between 1.6 and 1.7. So, this should be a one line update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)