[Nutch Wiki] Trivial Update of bin/nutch_crawl by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch_crawl page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch_crawl?action=diffrev1=14rev2=15 - Crawl is an alias for

[Nutch Wiki] Trivial Update of CommandLineOptions by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=13rev2=14 ||'''command'''||'''function'''||

[Nutch Wiki] Trivial Update of CommandLineOptions by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=14rev2=15 ||'''command'''||'''function'''||

[jira] [Created] (NUTCH-1020) Create or locate class for org.apache.nutch.tools.compat.CrawlDbConverter

2011-06-28 Thread Lewis John McGibbney (JIRA)
Create or locate class for org.apache.nutch.tools.compat.CrawlDbConverter - Key: NUTCH-1020 URL: https://issues.apache.org/jira/browse/NUTCH-1020 Project: Nutch Issue

[jira] [Commented] (NUTCH-1020) Create or locate class for org.apache.nutch.tools.compat.CrawlDbConverter

2011-06-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056344#comment-13056344 ] Lewis John McGibbney commented on NUTCH-1020: - I tagged this as linkdb (which

[Nutch Wiki] Trivial Update of CommandLineOptions by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=18rev2=19 ||'''command'''||'''function'''||

[Nutch Wiki] Trivial Update of CommandLineOptions by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=19rev2=20 ||[[bin/nutch plugin]]||Load a plugin and

[Nutch Wiki] Trivial Update of CommandLineOptions by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=20rev2=21 ||'''command'''||'''function'''||

[jira] [Updated] (NUTCH-1022) Upgrade version number of Nutch agent in conf

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1022: - Attachment: NUTCH-1022-1.4.patch Upgrade version number of Nutch agent in conf

[jira] [Resolved] (NUTCH-1022) Upgrade version number of Nutch agent in conf

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1022. -- Resolution: Fixed Committed in in rev. 1140619. Upgrade version number of Nutch agent in

[jira] [Updated] (NUTCH-1016) Strip UTF-8 non-character codepoints

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1016: - Attachment: NUTCH-1016-1.4-3.patch New patch also includes checking for non-printable control

[jira] [Updated] (NUTCH-1016) Strip UTF-8 non-character codepoints

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1016: - Attachment: (was: NUTCH-1016-1.4-2.patch) Strip UTF-8 non-character codepoints

[jira] [Updated] (NUTCH-1016) Strip UTF-8 non-character codepoints

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1016: - Attachment: NUTCH-1016-1.4-4.patch Previous patch included debug line to stdout. Removed now.

[jira] [Updated] (NUTCH-1021) Migrate OutlinkExtractor from Apache ORO to java.util.regex

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1021: - Attachment: NUTCH-1021-1.4.patch Here's a patch for 1.4. It compiles against trunk as well.

[jira] [Updated] (NUTCH-1021) Migrate OutlinkExtractor from Apache ORO to java.util.regex

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1021: - Attachment: NUTCH-1021-1.4-2.patch Reworked patch to pass unit test. It still complains about a

[jira] [Issue Comment Edited] (NUTCH-1017) Exception getting mime type by name

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056564#comment-13056564 ] Markus Jelsma edited comment on NUTCH-1017 at 6/28/11 3:12 PM:

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Patch Info: [Patch Available] Expose Tika's boilerpipe support

[jira] [Commented] (NUTCH-1019) Edit comment in org.apache.nutch.crawl.Crawl to reflect removal of legacy

2011-06-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056715#comment-13056715 ] Lewis John McGibbney commented on NUTCH-1019: - Yes I will do when I get home

[Nutch Wiki] Update of bin/nutch mergedb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch mergedb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch%20mergedb Comment: Update to reflect changes in Nutch 1.3 API and classes. New

[Nutch Wiki] Trivial Update of bin/nutch mergedb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch mergedb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch%20mergedb?action=diffrev1=1rev2=2 Comment: trivial formatting Mergedb is an

[Nutch Wiki] Trivial Update of bin/nutch mergedb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch mergedb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch%20mergedb?action=diffrev1=2rev2=3 Mergedb is an alias for

[jira] [Created] (NUTCH-1023) Trivial error in error message for org.apache.nutch.crawl.LinkDbReader

2011-06-28 Thread Lewis John McGibbney (JIRA)
Trivial error in error message for org.apache.nutch.crawl.LinkDbReader -- Key: NUTCH-1023 URL: https://issues.apache.org/jira/browse/NUTCH-1023 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1023) Trivial error in error message for org.apache.nutch.crawl.LinkDbReader

2011-06-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056750#comment-13056750 ] Lewis John McGibbney commented on NUTCH-1023: - I will submitt a patch in a

[Nutch Wiki] Update of bin/nutch readlinkdb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch readlinkdb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch%20readlinkdb Comment: Update to reflect Nutch 1.3 API New page: Readlinkdb is

[Nutch Wiki] Trivial Update of bin/nutch readlinkdb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch readlinkdb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch%20readlinkdb?action=diffrev1=1rev2=2 Comment: formatting Readlinkdb is an

[Nutch Wiki] Update of bin/nutch_readdb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch_readdb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch_readdb?action=diffrev1=8rev2=9 Comment: Update to reflect Nutch 1.3 API The

[Nutch Wiki] Trivial Update of bin/nutch_readdb by LewisJohnMcgibbney

2011-06-28 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch_readdb page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch_readdb?action=diffrev1=9rev2=10 Comment: trivial formatting Readdb is an alias

[jira] [Commented] (NUTCH-994) Fine tune Solr schema

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056983#comment-13056983 ] Hudson commented on NUTCH-994: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-986) Dedup fails due to date format (long)

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056984#comment-13056984 ] Hudson commented on NUTCH-986: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-995) Generate POM file using the Ivy makepom task

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056985#comment-13056985 ] Hudson commented on NUTCH-995: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-967) Upgrade to Tika 0.9

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056992#comment-13056992 ] Hudson commented on NUTCH-967: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-1006) meta equiv with single quotes not accepted

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056990#comment-13056990 ] Hudson commented on NUTCH-1006: --- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-999) Normalise String representation for Dates in IndexingFilters

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056988#comment-13056988 ] Hudson commented on NUTCH-999: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-991) SolrDedup must issue a commit

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056991#comment-13056991 ] Hudson commented on NUTCH-991: -- Integrated in Nutch-trunk #1530 (See

[jira] [Commented] (NUTCH-888) Remove parse-rss

2011-06-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056993#comment-13056993 ] Hudson commented on NUTCH-888: -- Integrated in Nutch-trunk #1530 (See