[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-25 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337814#comment-14337814
 ] 

Chris A. Mattmann commented on NUTCH-1925:
--

Great work!

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.p2.v2.patch, NUTCH-1925.palsulich.patch, 
 NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332350#comment-14332350
 ] 

Hudson commented on NUTCH-1925:
---

SUCCESS: Integrated in Nutch-nutchgora #1347 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1347/])
NUTCH-1925 Upgrade Tika to version 1.7 (lewismc: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1661539)
* /nutch/branches/2.x/CHANGES.txt
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaConfig.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaParser.java
* 
/nutch/branches/2.x/src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/DOMContentUtilsTest.java


 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.p2.v2.patch, NUTCH-1925.palsulich.patch, 
 NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-22 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332384#comment-14332384
 ] 

Sebastian Nagel commented on NUTCH-1925:


Great to see again successful Jenkins builds. Thanks!

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.p2.v2.patch, NUTCH-1925.palsulich.patch, 
 NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-19 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328194#comment-14328194
 ] 

Tyler Palsulich commented on NUTCH-1925:


Thanks [~wastl-nagel]. Looking into it more, 
org.apache.nutch.parse.tika.TikaConfig was deleted on the 1.x branch in 
NUTCH-1234 (see [this 
commit|https://github.com/apache/nutch/commit/7f44cdc998117eacc04609008fdac4ce1e2bb387#diff-a883bfa38ab4c09e2ee777564297367e])
 in favor of org.apache.tika.config.TikaConfig. But, the same change was never 
done on the 2.x branch. I can supply a patch that does it, but it will require 
some API changes. That should fix the discrepancy we're seeing between 1.x and 
2.x in this issue. Thoughts?

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-19 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328218#comment-14328218
 ] 

Markus Jelsma commented on NUTCH-1925:
--

Seems fine to me, if it passes the test.

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-19 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328291#comment-14328291
 ] 

Sebastian Nagel commented on NUTCH-1925:


Sounds good. Opened NUTCH-1945 to add a unit test for Xlsx files.

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-18 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325756#comment-14325756
 ] 

Sebastian Nagel commented on NUTCH-1925:


Hi [~tpalsulich], the patch breaks the parsing of XLSX files 
(src/testresources/test-mime-util/test.xlsx, cf. NUTCH-1605): the parser 
needs the additional hints from the URL (file name) and the content type sent 
in the HTTP response header. Also it's good to keep plugins the same (as much 
as possible) between trunk and 2.x. Needs further investigation what's going 
wrong, a unit test for the xlsx parser would be nice to have.

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.p2.patch, 
 NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-12 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319064#comment-14319064
 ] 

Markus Jelsma commented on NUTCH-1925:
--

Ja, ill check it in tomorrow. Any comments on other minor issues on 1.10 before 
we decide an RC?


 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925-2x.patch, NUTCH-1925.palsulich.patch, 
 NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-12 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317815#comment-14317815
 ] 

Markus Jelsma commented on NUTCH-1925:
--

Committed  to trunk in revision 1659168.


 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-11 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317186#comment-14317186
 ] 

Markus Jelsma commented on NUTCH-1925:
--

ill check it out and check it in tomorrow.

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-11 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317132#comment-14317132
 ] 

Lewis John McGibbney commented on NUTCH-1925:
-

Any objection to commit folks?

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-11 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317187#comment-14317187
 ] 

Markus Jelsma commented on NUTCH-1925:
--

Ill check it out, and check it in tomorrow
 
-Original message-


 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1

 Attachments: NUTCH-1925.palsulich.patch, NUTCH-1925.palsulich.v2.patch


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-01-31 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1426#comment-1426
 ] 

Markus Jelsma commented on NUTCH-1925:
--

Tyler, can you attempt to provide a patch for the upgrade. There are, indeed, 
no API changes so following Lewis' guide should work out just fine.


 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Priority: Blocker
 Fix For: 2.4, 1.10


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-01-31 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300060#comment-14300060
 ] 

Tyler Palsulich commented on NUTCH-1925:


I'd be happy to, [~markus17]! But, I think I'm running into (I think) 
NUTCH-1925 right now. So, I don't want to add a patch, yet.

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich
Assignee: Markus Jelsma
Priority: Blocker
 Fix For: 1.10, 2.3.1


 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-01-28 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295367#comment-14295367
 ] 

Lewis John McGibbney commented on NUTCH-1925:
-

Please also see 
[here|https://github.com/apache/nutch/blob/trunk/src/plugin/parse-tika/howto_upgrade_tika.txt]

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich

 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1925) Upgrade Tika to version 1.7

2015-01-28 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295366#comment-14295366
 ] 

Lewis John McGibbney commented on NUTCH-1925:
-

+1 [~tpalsulich]

 Upgrade Tika to version 1.7
 ---

 Key: NUTCH-1925
 URL: https://issues.apache.org/jira/browse/NUTCH-1925
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Tyler Palsulich

 Hi Folks. Nutch currently uses version 1.6 of Tika. There were no significant 
 API changes between 1.6 and 1.7. So, this should be a one line update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)