[jira] [Comment Edited] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-25 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116772#comment-15116772 ] Tien Nguyen Manh edited comment on NUTCH-961 at 1/26/16 6:57 AM: - AH yes,

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-25 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116772#comment-15116772 ] Tien Nguyen Manh commented on NUTCH-961: AH yes, Could you explain why we need to parse it twice?

[jira] [Updated] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2016-01-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2184: Attachment: NUTCH-2184v2.patch Updated patch for trunk. [~markus17], working to

[jira] [Created] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2206: --- Summary: Provide example scoring.similarity.stopword.file Key: NUTCH-2206 URL: https://issues.apache.org/jira/browse/NUTCH-2206 Project: Nutch

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116491#comment-15116491 ] Lewis John McGibbney commented on NUTCH-2206: - CC [~sujenshah] > Provide example

[jira] [Created] (NUTCH-2207) Remove class duplication and smarten-up scoring-similarity plugin

2016-01-25 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2207: --- Summary: Remove class duplication and smarten-up scoring-similarity plugin Key: NUTCH-2207 URL: https://issues.apache.org/jira/browse/NUTCH-2207

[jira] [Created] (NUTCH-2205) Nutch solrdedup error in solrcloud for doc

2016-01-25 Thread VictorHu (JIRA)
VictorHu created NUTCH-2205: --- Summary: Nutch solrdedup error in solrcloud for doc Key: NUTCH-2205 URL: https://issues.apache.org/jira/browse/NUTCH-2205 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-01-25 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114989#comment-15114989 ] Markus Jelsma commented on NUTCH-961: - That is probably due to the patch parsing twice. Once with BP

[jira] [Commented] (NUTCH-2205) Nutch solrdedup error in solrcloud for larger docs

2016-01-25 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114991#comment-15114991 ] Markus Jelsma commented on NUTCH-2205: -- This looks like your cluster was down, not a Nutch error. >

[jira] [Updated] (NUTCH-2205) Nutch solrdedup error in solrcloud for larger docs

2016-01-25 Thread VictorHu (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] VictorHu updated NUTCH-2205: Affects Version/s: 2.3 Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop 2.5.2,Zookeeper