[jira] Commented: (NUTCH-278) Fetcher-status might need clarification: kbit/s instead of kb/s shown

2010-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882835#action_12882835 ] Hudson commented on NUTCH-278: -- Integrated in Nutch-trunk #1189 (See

[jira] Commented: (NUTCH-832) Website menu has lots of broken links - in particular the API docs

2010-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882837#action_12882837 ] Hudson commented on NUTCH-832: -- Integrated in Nutch-trunk #1189 (See

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884177#action_12884177 ] Hudson commented on NUTCH-834: -- Integrated in Nutch-trunk #1194 (See

[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884540#action_12884540 ] Hudson commented on NUTCH-835: -- Integrated in Nutch-trunk #1195 (See

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884996#action_12884996 ] Hudson commented on NUTCH-837: -- Integrated in Nutch-trunk #1197 (See

[jira] Commented: (NUTCH-838) Add timing information to all Tool classes

2010-07-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884997#action_12884997 ] Hudson commented on NUTCH-838: -- Integrated in Nutch-trunk #1197 (See

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-07-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884995#action_12884995 ] Hudson commented on NUTCH-836: -- Integrated in Nutch-trunk #1197 (See

[jira] [Commented] (NUTCH-994) Fine tune Solr schema

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056983#comment-13056983 ] Hudson commented on NUTCH-994: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-986) Dedup fails due to date format (long)

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056984#comment-13056984 ] Hudson commented on NUTCH-986: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-995) Generate POM file using the Ivy makepom task

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056985#comment-13056985 ] Hudson commented on NUTCH-995: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-967) Upgrade to Tika 0.9

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056992#comment-13056992 ] Hudson commented on NUTCH-967: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-1006) meta equiv with single quotes not accepted

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056990#comment-13056990 ] Hudson commented on NUTCH-1006: --- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-999) Normalise String representation for Dates in IndexingFilters

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056988#comment-13056988 ] Hudson commented on NUTCH-999: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-991) SolrDedup must issue a commit

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056991#comment-13056991 ] Hudson commented on NUTCH-991: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-888) Remove parse-rss

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056993#comment-13056993 ] Hudson commented on NUTCH-888: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-1013) Migrate RegexURLNormalizer from Apache ORO to java.util.regex

2011-07-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059677#comment-13059677 ] Hudson commented on NUTCH-1013: --- Integrated in Nutch-trunk #1536 (See

[jira] [Commented] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-07-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061041#comment-13061041 ] Hudson commented on NUTCH-1011: --- Integrated in Nutch-trunk #1538 (See

[jira] [Commented] (NUTCH-1027) Degrade log level of `can't find rules for scope`

2011-07-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063704#comment-13063704 ] Hudson commented on NUTCH-1027: --- Integrated in Nutch-trunk #1543 (See

[jira] [Commented] (NUTCH-1043) Add pattern for filtering .js in default url filters

2011-07-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067481#comment-13067481 ] Hudson commented on NUTCH-1043: --- Integrated in Nutch-trunk #1550 (See

[jira] [Commented] (NUTCH-1055) upgrade package.html file in language identifier plugin

2011-07-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067482#comment-13067482 ] Hudson commented on NUTCH-1055: --- Integrated in Nutch-trunk #1550 (See

[jira] [Commented] (NUTCH-1037) Deduplicate anchors before indexing

2011-07-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068149#comment-13068149 ] Hudson commented on NUTCH-1037: --- Integrated in Nutch-trunk #1551 (See

[jira] [Commented] (NUTCH-1045) MimeUtil to rely on default config provided by Tika

2011-07-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070930#comment-13070930 ] Hudson commented on NUTCH-1045: --- Integrated in Nutch-trunk #1557 (See

[jira] [Commented] (NUTCH-1065) New mvn.template

2011-08-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079783#comment-13079783 ] Hudson commented on NUTCH-1065: --- Integrated in Nutch-trunk #1567 (See

[jira] [Commented] (NUTCH-1099) Add HBase and Cassandra storage properties to nutch-default.xml

2011-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102414#comment-13102414 ] Hudson commented on NUTCH-1099: --- Integrated in Nutch-trunk-ant #32 (See

[jira] [Commented] (NUTCH-1099) Add HBase and Cassandra storage properties to nutch-default.xml

2011-09-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102413#comment-13102413 ] Hudson commented on NUTCH-1099: --- Integrated in Nutch-trunk #1601 (See

[jira] [Commented] (NUTCH-1114) Attr file missing in domain filter

2011-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108360#comment-13108360 ] Hudson commented on NUTCH-1114: --- Integrated in Nutch-branch-1.4 #11 (See

[jira] [Commented] (NUTCH-1067) Configure minimum throughput for fetcher

2011-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108359#comment-13108359 ] Hudson commented on NUTCH-1067: --- Integrated in Nutch-branch-1.4 #11 (See

[jira] [Commented] (NUTCH-1115) Option to disable fixing of embedded params in DomContentUtils

2011-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113148#comment-13113148 ] Hudson commented on NUTCH-1115: --- Integrated in Nutch-branch-1.4 #14 (See

[jira] [Commented] (NUTCH-1074) topN is ignored with maxNumSegments

2011-09-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113890#comment-13113890 ] Hudson commented on NUTCH-1074: --- Integrated in Nutch-branch-1.4 #15 (See

[jira] [Commented] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-09-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114129#comment-13114129 ] Hudson commented on NUTCH-623: -- Integrated in Nutch-trunk #1611 (See

[jira] [Commented] (NUTCH-623) Change plugin source directory languageidentifier to language-identifier

2011-09-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114785#comment-13114785 ] Hudson commented on NUTCH-623: -- Integrated in Nutch-trunk #1613 (See

[jira] [Commented] (NUTCH-1189) add commented out default settings to gora.properties files

2012-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263362#comment-13263362 ] Hudson commented on NUTCH-1189: --- Integrated in Nutch-nutchgora #240 (See

[jira] [Commented] (NUTCH-882) Design a Host table in GORA

2012-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263363#comment-13263363 ] Hudson commented on NUTCH-882: -- Integrated in Nutch-nutchgora #240 (See

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263365#comment-13263365 ] Hudson commented on NUTCH-902: -- Integrated in Nutch-nutchgora #240 (See

[jira] [Commented] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer

2012-04-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263364#comment-13263364 ] Hudson commented on NUTCH-1340: --- Integrated in Nutch-nutchgora #240 (See

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271069#comment-13271069 ] Hudson commented on NUTCH-1352: --- Integrated in Nutch-nutchgora #248 (See

[jira] [Commented] (NUTCH-1349) Make batchId explcit within debug logging and improve CLI

2012-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271068#comment-13271068 ] Hudson commented on NUTCH-1349: --- Integrated in Nutch-nutchgora #248 (See

[jira] [Commented] (NUTCH-1353) nutchgora DomainStatistics support crawlId, counter bug and reformatting

2012-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271070#comment-13271070 ] Hudson commented on NUTCH-1353: --- Integrated in Nutch-nutchgora #248 (See

[jira] [Commented] (NUTCH-1354) nutchgora support fetcher.queue.depth.multiplier property

2012-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271071#comment-13271071 ] Hudson commented on NUTCH-1354: --- Integrated in Nutch-nutchgora #248 (See

[jira] [Commented] (NUTCH-1355) nutchgora Configure minimum throughput for fetcher

2012-05-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271072#comment-13271072 ] Hudson commented on NUTCH-1355: --- Integrated in Nutch-nutchgora #248 (See

[jira] [Commented] (NUTCH-1358) Do not accept bogus arguments

2012-05-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273026#comment-13273026 ] Hudson commented on NUTCH-1358: --- Integrated in Nutch-nutchgora #249 (See

[jira] [Commented] (NUTCH-1026) Strip UTF-8 non-character codepoints

2012-05-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273027#comment-13273027 ] Hudson commented on NUTCH-1026: --- Integrated in Nutch-nutchgora #249 (See

[jira] [Commented] (NUTCH-1362) Fix error handling of urls with empty fields

2012-05-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273829#comment-13273829 ] Hudson commented on NUTCH-1362: --- Integrated in Nutch-nutchgora #250 (See

[jira] [Commented] (NUTCH-1366) speed up indexing by eliminating the indexreducer

2012-05-14 Thread Hudson (JIRA)
Hudson

[jira] [Commented] (NUTCH-1378) HostDb NullPointerException

2012-05-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282201#comment-13282201 ] Hudson commented on NUTCH-1378: --- Integrated in Nutch-nutchgora #262 (See

[jira] [Commented] (NUTCH-1381) Allow to override default subcollection field name

2012-06-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291218#comment-13291218 ] Hudson commented on NUTCH-1381: --- Integrated in nutch-trunk-maven #299 (See

[jira] [Commented] (NUTCH-1320) IndexChecker and ParseChecker choke on IDN's

2012-06-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291219#comment-13291219 ] Hudson commented on NUTCH-1320: --- Integrated in nutch-trunk-maven #299 (See

[jira] [Commented] (NUTCH-1351) DomainStatistics to aggregate by TLD

2012-06-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291220#comment-13291220 ] Hudson commented on NUTCH-1351: --- Integrated in nutch-trunk-maven #299 (See

[jira] [Commented] (NUTCH-1346) Follow outlinks to ignore external

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291596#comment-13291596 ] Hudson commented on NUTCH-1346: --- Integrated in nutch-trunk-maven #301 (See

[jira] [Commented] (NUTCH-1336) Optionally not index db_notmodified pages

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291628#comment-13291628 ] Hudson commented on NUTCH-1336: --- Integrated in nutch-trunk-maven #302 (See

[jira] [Commented] (NUTCH-1381) Allow to override default subcollection field name

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291682#comment-13291682 ] Hudson commented on NUTCH-1381: --- Integrated in Nutch-trunk #1865 (See

[jira] [Commented] (NUTCH-1336) Optionally not index db_notmodified pages

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291686#comment-13291686 ] Hudson commented on NUTCH-1336: --- Integrated in Nutch-trunk #1865 (See

[jira] [Commented] (NUTCH-1346) Follow outlinks to ignore external

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291684#comment-13291684 ] Hudson commented on NUTCH-1346: --- Integrated in Nutch-trunk #1865 (See

[jira] [Commented] (NUTCH-1320) IndexChecker and ParseChecker choke on IDN's

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291683#comment-13291683 ] Hudson commented on NUTCH-1320: --- Integrated in Nutch-trunk #1865 (See

[jira] [Commented] (NUTCH-1351) DomainStatistics to aggregate by TLD

2012-06-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291685#comment-13291685 ] Hudson commented on NUTCH-1351: --- Integrated in Nutch-trunk #1865 (See

[jira] [Commented] (NUTCH-1262) Map `duplicating` content-types to a single type

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293039#comment-13293039 ] Hudson commented on NUTCH-1262: --- Integrated in nutch-trunk-maven #306 (See

[jira] [Commented] (NUTCH-1385) More robust plug-in order properties in nutch-site.xml

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293040#comment-13293040 ] Hudson commented on NUTCH-1385: --- Integrated in nutch-trunk-maven #306 (See

[jira] [Commented] (NUTCH-1384) Typo in ParseSegment's run-method

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293041#comment-13293041 ] Hudson commented on NUTCH-1384: --- Integrated in nutch-trunk-maven #306 (See

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293085#comment-13293085 ] Hudson commented on NUTCH-1360: --- Integrated in nutch-trunk-maven #307 (See

[jira] [Commented] (NUTCH-1364) Add a counter in Generator for malformed urls

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293245#comment-13293245 ] Hudson commented on NUTCH-1364: --- Integrated in nutch-trunk-maven #308 (See

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1329#comment-1329 ] Hudson commented on NUTCH-1360: --- Integrated in Nutch-trunk #1868 (See

[jira] [Commented] (NUTCH-1385) More robust plug-in order properties in nutch-site.xml

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293336#comment-13293336 ] Hudson commented on NUTCH-1385: --- Integrated in Nutch-trunk #1868 (See

[jira] [Commented] (NUTCH-1262) Map `duplicating` content-types to a single type

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293334#comment-13293334 ] Hudson commented on NUTCH-1262: --- Integrated in Nutch-trunk #1868 (See

[jira] [Commented] (NUTCH-1384) Typo in ParseSegment's run-method

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293337#comment-13293337 ] Hudson commented on NUTCH-1384: --- Integrated in Nutch-trunk #1868 (See

[jira] [Commented] (NUTCH-1364) Add a counter in Generator for malformed urls

2012-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293335#comment-13293335 ] Hudson commented on NUTCH-1364: --- Integrated in Nutch-trunk #1868 (See

[jira] [Commented] (NUTCH-1330) OutlinkDB to preserve back up

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293545#comment-13293545 ] Hudson commented on NUTCH-1330: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293543#comment-13293543 ] Hudson commented on NUTCH-1024: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293546#comment-13293546 ] Hudson commented on NUTCH-1352: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1300) Indexer to normalize URL's

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293544#comment-13293544 ] Hudson commented on NUTCH-1300: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1386) Headings filter not to add empty values

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293547#comment-13293547 ] Hudson commented on NUTCH-1386: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293548#comment-13293548 ] Hudson commented on NUTCH-1356: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1319) HostNormalizer

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293549#comment-13293549 ] Hudson commented on NUTCH-1319: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295718#comment-13295718 ] Hudson commented on NUTCH-1398: --- Integrated in nutch-trunk-maven #314 (See

[jira] [Commented] (NUTCH-1396) Upgrade to Tika 1.1

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295766#comment-13295766 ] Hudson commented on NUTCH-1396: --- Integrated in Nutch-nutchgora #281 (See

[jira] [Commented] (NUTCH-1392) -force and -resume arguments being ignored in ParserJob

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295767#comment-13295767 ] Hudson commented on NUTCH-1392: --- Integrated in Nutch-nutchgora #281 (See

[jira] [Commented] (NUTCH-1300) Indexer to normalize URL's

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295798#comment-13295798 ] Hudson commented on NUTCH-1300: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295803#comment-13295803 ] Hudson commented on NUTCH-1356: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1398) Upgrade to Hadoop 1.0.3

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295800#comment-13295800 ] Hudson commented on NUTCH-1398: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295801#comment-13295801 ] Hudson commented on NUTCH-1352: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295797#comment-13295797 ] Hudson commented on NUTCH-1024: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1386) Headings filter not to add empty values

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295802#comment-13295802 ] Hudson commented on NUTCH-1386: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1330) OutlinkDB to preserve back up

2012-06-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295799#comment-13295799 ] Hudson commented on NUTCH-1330: --- Integrated in Nutch-trunk #1869 (See

[jira] [Commented] (NUTCH-1404) Nutch script fails to find job file in deploy mode

2012-06-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396778#comment-13396778 ] Hudson commented on NUTCH-1404: --- Integrated in nutch-trunk-maven #319 (See

[jira] [Commented] (NUTCH-1399) TestProtocolHttpClient fails

2012-06-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397249#comment-13397249 ] Hudson commented on NUTCH-1399: --- Integrated in Nutch-nutchgora #286 (See

[jira] [Commented] (NUTCH-1401) Upgrade to Hadoop 1.0.3

2012-06-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397250#comment-13397250 ] Hudson commented on NUTCH-1401: --- Integrated in Nutch-nutchgora #286 (See

[jira] [Commented] (NUTCH-1404) Nutch script fails to find job file in deploy mode

2012-06-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397248#comment-13397248 ] Hudson commented on NUTCH-1404: --- Integrated in Nutch-nutchgora #286 (See

[jira] [Commented] (NUTCH-1404) Nutch script fails to find job file in deploy mode

2012-06-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397258#comment-13397258 ] Hudson commented on NUTCH-1404: --- Integrated in Nutch-trunk #1874 (See

[jira] [Commented] (NUTCH-1400) Remove developer -core option for bin/nutch

2012-06-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397398#comment-13397398 ] Hudson commented on NUTCH-1400: --- Integrated in nutch-trunk-maven #321 (See

[jira] [Commented] (NUTCH-1391) readdb -stats fires java.io.EOFException

2012-06-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398181#comment-13398181 ] Hudson commented on NUTCH-1391: --- Integrated in Nutch-nutchgora #287 (See

[jira] [Commented] (NUTCH-1400) Remove developer -core option for bin/nutch

2012-06-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398180#comment-13398180 ] Hudson commented on NUTCH-1400: --- Integrated in Nutch-nutchgora #287 (See

[jira] [Commented] (NUTCH-1400) Remove developer -core option for bin/nutch

2012-06-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398183#comment-13398183 ] Hudson commented on NUTCH-1400: --- Integrated in Nutch-trunk #1875 (See

[jira] [Commented] (NUTCH-1407) BasicIndexingFilter to optionally add domain field

2012-06-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400520#comment-13400520 ] Hudson commented on NUTCH-1407: --- Integrated in nutch-trunk-maven #328 (See

[jira] [Commented] (NUTCH-1251) SolrDedup to use proper Lucene catch-all query

2012-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401269#comment-13401269 ] Hudson commented on NUTCH-1251: --- Integrated in nutch-trunk-maven #330 (See

[jira] [Commented] (NUTCH-1319) HostNormalizer

2012-06-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401275#comment-13401275 ] Hudson commented on NUTCH-1319: --- Integrated in nutch-trunk-maven #331 (See

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-07-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407353#comment-13407353 ] Hudson commented on NUTCH-1405: --- Integrated in nutch-trunk-maven #341 (See

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-07-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419205#comment-13419205 ] Hudson commented on NUTCH-1388: --- Integrated in nutch-trunk-maven #359 (See

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419206#comment-13419206 ] Hudson commented on NUTCH-1433: --- Integrated in nutch-trunk-maven #359 (See

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419307#comment-13419307 ] Hudson commented on NUTCH-1433: --- Integrated in nutch-trunk-maven #360 (See

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-07-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419744#comment-13419744 ] Hudson commented on NUTCH-1388: --- Integrated in Nutch-trunk #1903 (See

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419745#comment-13419745 ] Hudson commented on NUTCH-1433: --- Integrated in Nutch-trunk #1903 (See

  1   2   3   4   5   6   7   8   9   10   >