[jira] Updated: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-739: -- Attachment: NUTCH-739_remove_optimize_on_solr_dedup.patch This simple patch decrease dedup time

[jira] Created: (NUTCH-740) Configuration option to override default language for fetched pages.

2009-05-28 Thread Marcin Okraszewski (JIRA)
Configuration option to override default language for fetched pages. Key: NUTCH-740 URL: https://issues.apache.org/jira/browse/NUTCH-740 Project: Nutch Issue Type:

[jira] Updated: (NUTCH-740) Configuration option to override default language for fetched pages.

2009-05-28 Thread Marcin Okraszewski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-740: - Attachment: AcceptLanguage.patch The patch which allows overriding of Accept-Language

Re: Remove duplicate nutch conf files from .job file

2009-05-28 Thread Otis Gospodnetic
Hi Kirby, Do you think you could add this to Nutch's JIRA? Please see http://wiki.apache.org/nutch/HowToContribute Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kirby Bohling kirby.bohl...@gmail.com To: nutch-dev@lucene.apache.org

[jira] Updated: (NUTCH-740) Configuration option to override default language for fetched pages.

2009-05-28 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated NUTCH-740: --- Priority: Minor (was: Major) Affects Version/s: (was: 0.9.0) Fix

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714264#action_12714264 ] Dmitry Lihachev commented on NUTCH-739: --- in my recrawl script I have following lines

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714277#action_12714277 ] Ken Krugler commented on NUTCH-739: --- There's another approach that works well here, and

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714286#action_12714286 ] Otis Gospodnetic commented on NUTCH-739: Yes, external optimize calls will work, I

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714287#action_12714287 ] Dmitry Lihachev commented on NUTCH-739: --- with this approach we still have few optimize

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714288#action_12714288 ] Dmitry Lihachev commented on NUTCH-739: --- am I wrong? SolrDeleteDuplications too slow

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714290#action_12714290 ] Dmitry Lihachev commented on NUTCH-739: --- I think that optimizing solr - is not hadoop