[jira] [Updated] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adnane B. updated NUTCH-: - Description: This problem happens at the the second time I crawl a page bin/nutch inject urls/ bin/nutch

[jira] [Updated] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adnane B. updated NUTCH-: - Description: This problem happens at the the second time a crawl a page bin/nutch inject urls/ bin/nutch

[jira] [Updated] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adnane B. updated NUTCH-: - Description: This problem happens at the the second time a crawl a page bin/nutch inject urls/ bin/nutch

[jira] [Updated] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adnane B. updated NUTCH-: - Description: This problem happens at the the second time a crawl a page bin/nutch inject urls/ bin/nutch

[jira] [Updated] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adnane B. updated NUTCH-: - Summary: fetch deletes all metadata except _csh_ and _rs_ (was: updatedb deletes all metadata except

[jira] [Updated] (NUTCH-2222) updatedb deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adnane B. updated NUTCH-: - Description: This problem happens at the the second update on a crawled a page ** that has not changed**

[jira] [Created] (NUTCH-2222) updatedb deletes all metadata except _csh_ and _rs_

2016-02-16 Thread Adnane B. (JIRA)
Adnane B. created NUTCH-: Summary: updatedb deletes all metadata except _csh_ and _rs_ Key: NUTCH- URL: https://issues.apache.org/jira/browse/NUTCH- Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148758#comment-15148758 ] Hudson commented on NUTCH-961: -- SUCCESS: Integrated in Nutch-trunk #3347 (See

[jira] [Resolved] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-961. - Resolution: Fixed Committed to trunk in revision 1730694. Thanks everyone for contributions. >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961.patch Updated patch. ExtractorRepository was missing. > Expose Tika's

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: 1.12 > Expose Tika's boilerpipe support > > >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Affects Version/s: 1.11 > Expose Tika's boilerpipe support > > >

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148642#comment-15148642 ] Markus Jelsma commented on NUTCH-961: - Tests pass as expected and Boilerpipe as well. Will commit

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Description: Tika 0.8 comes with the Boilerpipe content handler which can be used to extract

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Description: Tika 0.8 comes with the Boilerpipe content handler which can be used to extract

[jira] [Commented] (NUTCH-2210) Upgrade to Tika 1.12

2016-02-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148631#comment-15148631 ] Hudson commented on NUTCH-2210: --- SUCCESS: Integrated in Nutch-trunk #3346 (See

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961.patch Patch for trunk. > Expose Tika's boilerpipe support >

[jira] [Commented] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148632#comment-15148632 ] Hudson commented on NUTCH-1233: --- SUCCESS: Integrated in Nutch-trunk #3346 (See

[jira] [Resolved] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1233. -- Resolution: Fixed Committed to trunk in revision 1730687. > Rely on Tika for outlink

[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1233: - Affects Version/s: 1.11 > Rely on Tika for outlink extraction >

[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1233: - Fix Version/s: 1.12 > Rely on Tika for outlink extraction > --- >

[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1233: - Component/s: parser > Rely on Tika for outlink extraction > --- >

[jira] [Commented] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148590#comment-15148590 ] Markus Jelsma commented on NUTCH-1233: -- Awesome! Everything works as expected since the Tika 1.12

[jira] [Resolved] (NUTCH-2210) Upgrade to Tika 1.12

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2210. -- Resolution: Fixed Committed to trunk in revision 1730686. > Upgrade to Tika 1.12 >

[jira] [Commented] (NUTCH-2210) Upgrade to Tika 1.12

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148572#comment-15148572 ] Markus Jelsma commented on NUTCH-2210: -- Test passes, will commit shortly. > Upgrade to Tika 1.12 >

[jira] [Updated] (NUTCH-2210) Upgrade to Tika 1.12

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2210: - Attachment: NUTCH-2210.patch Patch for trunk. > Upgrade to Tika 1.12 > > >

[jira] [Commented] (NUTCH-2197) Add solr5 solrcloud indexer support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148489#comment-15148489 ] Markus Jelsma commented on NUTCH-2197: -- Hello Arun - no, this is not applied to 2.3.1. The plugins