[jira] Created: (NUTCH-691) Update jakarta poi jars to the most relevant version

2009-02-17 Thread Dmitry Lihachev (JIRA)
Update jakarta poi jars to the most relevant version Key: NUTCH-691 URL: https://issues.apache.org/jira/browse/NUTCH-691 Project: Nutch Issue Type: Improvement Components:

[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version

2009-02-17 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-691: -- Attachment: NUTCH-691-v1-test.patch cd nutch; Update jakarta poi jars to the most relevant

[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version

2009-02-17 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-691: -- Attachment: NUTCH-691-v1-test.patch Update jakarta poi jars to the most relevant version

[jira] Commented: (NUTCH-691) Update jakarta poi jars to the most relevant version

2009-02-17 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674468#action_12674468 ] Dmitry Lihachev commented on NUTCH-691: --- Steps to reproduce NUTCH-591 (you must have

[jira] Issue Comment Edited: (NUTCH-691) Update jakarta poi jars to the most relevant version

2009-02-17 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674468#action_12674468 ] dmitry.lihachev edited comment on NUTCH-691 at 2/17/09 9:39 PM:

[jira] Commented: (NUTCH-591) StringIndexOutOfBoundsException when extracting text from a Word document.

2009-02-17 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674478#action_12674478 ] Dmitry Lihachev commented on NUTCH-591: --- can be resolved via NUTCH-691

[jira] Updated: (NUTCH-691) Update jakarta poi jars to the most relevant version

2009-02-17 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-691: -- Remaining Estimate: 0.25h Original Estimate: 0.25h Update jakarta poi jars to the most

[jira] Created: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
incorrect mime type detection by MoreIndexingFilter plugin -- Key: NUTCH-695 URL: https://issues.apache.org/jira/browse/NUTCH-695 Project: Nutch Issue Type: Bug Components:

[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-695: -- Attachment: NUTCH-695_MoreIndexingFilter.patch Test case for this bug incorrect mime type

[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-695: -- Attachment: NUTCH-695_MoreIndexingFilter.patch This patch fixes this bug incorrect mime type

[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-695: -- Attachment: (was: NUTCH-695_MoreIndexingFilter.patch) incorrect mime type detection by

[jira] Updated: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-695: -- Attachment: NUTCH-695_TestMoreIndexingFilter.patch incorrect mime type detection by

[jira] Issue Comment Edited: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674952#action_12674952 ] dmitry.lihachev edited comment on NUTCH-695 at 2/19/09 2:15 AM:

[jira] Issue Comment Edited: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674953#action_12674953 ] dmitry.lihachev edited comment on NUTCH-695 at 2/19/09 2:16 AM:

[jira] Commented: (NUTCH-695) incorrect mime type detection by MoreIndexingFilter plugin

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674956#action_12674956 ] Dmitry Lihachev commented on NUTCH-695: --- thank you, Sami incorrect mime type

[jira] Commented: (NUTCH-684) Dedup support for Solr

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675232#action_12675232 ] Dmitry Lihachev commented on NUTCH-684: --- This patch works for me too. Dedup support

[jira] Updated: (NUTCH-684) Dedup support for Solr

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-684: -- Attachment: NUTCH-684_bin_nutch.patch patch for bin/nutch so we can write {{bin/nutch

[jira] Updated: (NUTCH-684) Dedup support for Solr

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-684: -- Attachment: NUTCH-684_solrdedup_v2.patch Produce a little more log output Dedup support for

[jira] Issue Comment Edited: (NUTCH-684) Dedup support for Solr

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675260#action_12675260 ] dmitry.lihachev edited comment on NUTCH-684 at 2/19/09 10:40 PM:

[jira] Updated: (NUTCH-684) Dedup support for Solr

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-684: -- Attachment: (was: NUTCH-684_solrdedup_v2.patch) Dedup support for Solr

[jira] Updated: (NUTCH-684) Dedup support for Solr

2009-02-19 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-684: -- Attachment: NUTCH-684_solrdedup_v2.patch Dedup support for Solr --

[jira] Created: (NUTCH-697) Generate log output for solr indexer and dedup

2009-02-20 Thread Dmitry Lihachev (JIRA)
Generate log output for solr indexer and dedup -- Key: NUTCH-697 URL: https://issues.apache.org/jira/browse/NUTCH-697 Project: Nutch Issue Type: Improvement Components: indexer

[jira] Updated: (NUTCH-697) Generate log output for solr indexer and dedup

2009-02-20 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-697: -- Attachment: NUTCH-697_solr_logs.patch Generate log output for solr indexer and dedup

[jira] Commented: (NUTCH-684) Dedup support for Solr

2009-02-20 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675311#action_12675311 ] Dmitry Lihachev commented on NUTCH-684: --- bq. there is a silent assumption that Solr

[jira] Commented: (NUTCH-699) Add an official solr schema for solr integration

2009-02-20 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675324#action_12675324 ] Dmitry Lihachev commented on NUTCH-699: --- I think we must extends field set for each

[jira] Commented: (NUTCH-644) RTF parser doesn't compile anymore

2009-02-24 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676227#action_12676227 ] Dmitry Lihachev commented on NUTCH-644: --- I found sources of RTFParser.jj (ASF) and

[jira] Updated: (NUTCH-644) RTF parser doesn't compile anymore

2009-02-24 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-644: -- Attachment: NUTCH-644_v3.patch RTF parser doesn't compile anymore

[jira] Created: (NUTCH-705) parse-rtf plugin

2009-02-26 Thread Dmitry Lihachev (JIRA)
parse-rtf plugin Key: NUTCH-705 URL: https://issues.apache.org/jira/browse/NUTCH-705 Project: Nutch Issue Type: New Feature Components: fetcher Affects Versions: 1.0.0 Reporter: Dmitry Lihachev

[jira] Commented: (NUTCH-705) parse-rtf plugin

2009-02-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677242#action_12677242 ] Dmitry Lihachev commented on NUTCH-705: --- This parser correctly handles non ascii input

[jira] Updated: (NUTCH-705) parse-rtf plugin

2009-02-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-705: -- Attachment: NUTCH-705.patch parse-rtf plugin Key:

[jira] Commented: (NUTCH-644) RTF parser doesn't compile anymore

2009-02-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677244#action_12677244 ] Dmitry Lihachev commented on NUTCH-644: --- this parser incorrectly handles non-ascii

[jira] Commented: (NUTCH-705) parse-rtf plugin

2009-03-01 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677878#action_12677878 ] Dmitry Lihachev commented on NUTCH-705: --- Yes, it looks a bit like a problem... How can

[jira] Created: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-09 Thread Dmitry Lihachev (JIRA)
Subcollection plugin doesn't work with default subcollections.xml file -- Key: NUTCH-715 URL: https://issues.apache.org/jira/browse/NUTCH-715 Project: Nutch Issue Type: Bug

[jira] Updated: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-10 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-715: -- Attachment: NUTCH-715-testcase.patch Subcollection plugin doesn't work with default

[jira] Updated: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-10 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-715: -- Attachment: NUTCH-715-fix.patch Subcollection plugin doesn't work with default

[jira] Updated: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-10 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-715: -- Attachment: (was: NUTCH-715-fix.patch) Subcollection plugin doesn't work with default

[jira] Updated: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-10 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-715: -- Attachment: NUTCH-715_subcollections_fix.patch Subcollection plugin doesn't work with default

[jira] Created: (NUTCH-716) Make subcollection index filed multivalued

2009-03-10 Thread Dmitry Lihachev (JIRA)
Make subcollection index filed multivalued -- Key: NUTCH-716 URL: https://issues.apache.org/jira/browse/NUTCH-716 Project: Nutch Issue Type: Improvement Components: indexer Affects

[jira] Updated: (NUTCH-716) Make subcollection index filed multivalued

2009-03-10 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-716: -- Attachment: NUTCH-716_multivalued_subcollection.patch Make subcollection index filed

[jira] Created: (NUTCH-718) urlfilter-subnets plugin

2009-03-12 Thread Dmitry Lihachev (JIRA)
urlfilter-subnets plugin Key: NUTCH-718 URL: https://issues.apache.org/jira/browse/NUTCH-718 Project: Nutch Issue Type: New Feature Reporter: Dmitry Lihachev Priority: Minor This plugin

[jira] Updated: (NUTCH-718) urlfilter-subnets plugin

2009-03-12 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-718: -- Attachment: NUTCH-718_urlfilter_subnets.patch {code} cd nutch-trunk patch -p0

[jira] Commented: (NUTCH-699) Add an official solr schema for solr integration

2009-03-15 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682223#action_12682223 ] Dmitry Lihachev commented on NUTCH-699: --- In some cases (eg. when using

[jira] Commented: (NUTCH-706) Url regex normalizer

2009-03-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12689385#action_12689385 ] Dmitry Lihachev commented on NUTCH-706: --- I think this must be changed to {code:xml}

[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2009-03-31 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-578: -- Attachment: NUTCH-578_v3.patch changes in CrawlDbReducer already applied in trunk, so patch

[jira] Updated: (NUTCH-716) Make subcollection index filed multivalued

2009-05-21 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-716: -- Fix Version/s: 1.1 Make subcollection index filed multivalued

[jira] Created: (NUTCH-737) urlnormalizer-unalias plugin

2009-05-25 Thread Dmitry Lihachev (JIRA)
urlnormalizer-unalias plugin Key: NUTCH-737 URL: https://issues.apache.org/jira/browse/NUTCH-737 Project: Nutch Issue Type: New Feature Affects Versions: 1.0.0 Reporter: Dmitry Lihachev I tried

[jira] Updated: (NUTCH-737) urlnormalizer-unalias plugin

2009-05-25 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-737: -- Priority: Minor (was: Major) urlnormalizer-unalias plugin

[jira] Updated: (NUTCH-737) urlnormalizer-unalias plugin

2009-05-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-737: -- Attachment: NUTCH-737_urlfilter_unalias.patch urlnormalizer-unalias plugin

[jira] Updated: (NUTCH-737) urlnormalizer-unalias plugin

2009-05-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-737: -- Attachment: (was: NUTCH-737_urlfilter_unalias.patch) urlnormalizer-unalias plugin

[jira] Commented: (NUTCH-702) Lazy Instanciation of Metadata in CrawlDatum

2009-05-26 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713383#action_12713383 ] Dmitry Lihachev commented on NUTCH-702: --- I catched NPE when using this patch {code}

[jira] Created: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-27 Thread Dmitry Lihachev (JIRA)
SolrDeleteDuplications too slow when using hadoop - Key: NUTCH-739 URL: https://issues.apache.org/jira/browse/NUTCH-739 Project: Nutch Issue Type: Bug Components: indexer Affects

[jira] Updated: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-27 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-739: -- Description: in my environment i always have many warnings like this on the dedup step

[jira] Updated: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-739: -- Attachment: NUTCH-739_remove_optimize_on_solr_dedup.patch This simple patch decrease dedup time

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714264#action_12714264 ] Dmitry Lihachev commented on NUTCH-739: --- in my recrawl script I have following lines

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714287#action_12714287 ] Dmitry Lihachev commented on NUTCH-739: --- with this approach we still have few optimize

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714288#action_12714288 ] Dmitry Lihachev commented on NUTCH-739: --- am I wrong? SolrDeleteDuplications too slow

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-28 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714290#action_12714290 ] Dmitry Lihachev commented on NUTCH-739: --- I think that optimizing solr - is not hadoop

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-29 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714346#action_12714346 ] Dmitry Lihachev commented on NUTCH-739: --- Doğacan, I agree with you about curl usage.

[jira] Commented: (NUTCH-739) SolrDeleteDuplications too slow when using hadoop

2009-05-29 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714349#action_12714349 ] Dmitry Lihachev commented on NUTCH-739: --- Ooops, sorry... Tool is Map/Reduce

[jira] Commented: (NUTCH-570) Improvement of URL Ordering in Generator.java

2010-03-30 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851710#action_12851710 ] Dmitry Lihachev commented on NUTCH-570: --- Yeah, Otis. It's just an update so it applies