Re: Errors in PluginManifestParser

2006-04-25 Thread Dennis Kubes
There really is no error stack trace. It simply throws a one line NullPointerException while trying to fetch any http protocol site because it doesn't correctly find the attributes with the new getParameter call. The oneImplementation.getElementsByName(parameter) returns empty and therefore

Re: Errors in PluginManifestParser

2006-04-25 Thread Dennis Kubes
It must not be an error with the revision 394228 because it just started throwing the same behavior with the older version of the code as well. It was working yesterday and just started throwing the error. I am beginning to wonder if it is something within my eclipse setup? Dennis Dennis

Re: Errors in PluginManifestParser

2006-04-25 Thread Dennis Kubes
Ok, now the 394228 revision works. Sorry for the red herring. Dennis Dennis Kubes wrote: It must not be an error with the revision 394228 because it just started throwing the same behavior with the older version of the code as well. It was working yesterday and just started throwing the

Search engine project

2006-04-25 Thread omb
Hi, My name is Ayvind Binde. I came over nutch here yesterday, and I wonder if this is the best place to start if I'am gonna have my project completed. You see, I'am gonna make a search-engine site that indexes hyperlinks based on keywords searches. The customers should be able to log in and add

[jira] Updated: (NUTCH-255) Regular Expression for RegexUrlNormalizer to remove jsessionid

2006-04-25 Thread Dennis Kubes (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-255?page=all ] Dennis Kubes updated NUTCH-255: --- Attachment: urlnormalize_jessionid.patch Patch file that adds regular expression to remove jsessionid strings from urls to the regex-normalize.xml.template file.

CrawlDatum.metaData should never be null

2006-04-25 Thread Andrzej Bialecki
Hi, Per subject, I think it should follow the same pattern as other metadata maps in ParseData and Content. Currently when we allocate new CrawlDatum, metaData is null, which complicates the logic in all places that want to handle metaData. When CrawlDatum is serialized, we already check if

[jira] Updated: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression

2006-04-25 Thread Dennis Kubes (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-243?page=all ] Dennis Kubes updated NUTCH-243: --- Priority: Trivial (was: Minor) This is resolved by NUTCH-255 Some meta-refresh urls get ignored due to matching regular expression

Re: Nutch Parser Bug

2006-04-25 Thread Chris Mattmann
Hi Alex, I also noticed this issue a while back. It's described here: http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200510.mbox/%3c435 [EMAIL PROTECTED] Cheers, Chris On 4/25/06 2:41 PM, Alex [EMAIL PROTECTED] wrote: Hi there, I'm fairly new to nutch and in working on the

Re: Nutch Parser Bug

2006-04-25 Thread Alex
Hi Chris, Thanks for info! Do you know if Lucene have any plan on implementing a bug fix for this in the near future release? Alex Chris Mattmann [EMAIL PROTECTED] wrote: Hi Alex, I also noticed this issue a while back. It's described here: