Commons HttpClient 3.0 released

2005-12-22 Thread Stefan Groschupf
Hi, Since we know that our httpclient plugin has some problems may it is sensefully to update to the new library, I guess this is some work, but may someone is interested to take the job.:) http://www.theserverside.com/news/thread.tss?thread_id=38189 ttpClient 3.0 provides the following

Re: Commons HttpClient 3.0 released

2005-12-22 Thread Andrzej Bialecki
Stefan Groschupf wrote: Hi, Since we know that our httpclient plugin has some problems may it is sensefully to update to the new library, I guess this is some work, but may someone is interested to take the job.:) I'll take it, thanks for the heads-up. -- Best regards, Andrzej Bialecki

Re: Static initializers

2005-12-22 Thread marcel . schnippe
Hi, This is what i did to make NutchConf behave not so static, without patching any of those 195 places Stefan mentioned. NutchConf.get() yields the current config. OpenConf sets a new current config. finally CloseConf closes this config. But be warned about issues with the plugin cache

[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates

2005-12-22 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361128 ] Piotr Kosiorowski commented on NUTCH-148: - Do you have Cygwin installed? Is 'df' working in your cygwin installation? Do you run crawl from cygwin shell? Nutch

[jira] Created: (NUTCH-149) outlinks not shown properly in cached.jsp

2005-12-22 Thread raghavendra prabhu (JIRA)
outlinks not shown properly in cached.jsp - Key: NUTCH-149 URL: http://issues.apache.org/jira/browse/NUTCH-149 Project: Nutch Type: Bug Components: searcher, web gui Versions: 0.8-dev Environment: windows xp

[jira] Commented: (NUTCH-149) outlinks not shown properly in cached.jsp

2005-12-22 Thread raghavendra prabhu (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-149?page=comments#action_12361130 ] raghavendra prabhu commented on NUTCH-149: -- Do the outlinks work only when the HTML has a basetag So that the entire link may be constructed If not will the base

[jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-22 Thread raghavendra prabhu (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361131 ] raghavendra prabhu commented on NUTCH-61: - Will the same thing work for a filesystem For a file system , We can directly get the modified date store it in the db The

[jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content

2005-12-22 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361133 ] Andrzej Bialecki commented on NUTCH-61: This patch already supports this. Anyway, it needs to be significantly re-worked to fit into the current development version.

Removing old classes from trunk/

2005-12-22 Thread Andrzej Bialecki
Hi all, It's time to do some cleanup of the trunk/ after the mapred merge. I'm planning to remove the old classes in trunk/, from the following packages: * org.apache.nutch.db.* - all classes * org.apache.nutch.fetcher.* * org.apache.nutch.indexer.IndexSegment *

[jira] Updated: (NUTCH-150) OutlinkExtractor extremely slow on some non-plain text

2005-12-22 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-150?page=all ] Paul Baclace updated NUTCH-150: --- Attachment: OutlinkExtractor.java.patch This patch has 3 changes: 1. Adds a comment that non-plain-text can be a problem. 2. Adds quantifiers to the regular