Re: web ui improvement

2006-04-07 Thread Sami Siren
As part of the required functionality of the 0.8 release discussion on some other thread my opinion is to postbone any new ui functionality (for example NUTCH-48) until the new architecture is in place I would not veto someone testing & committing NUTCH-48. We should avoid investing too mu

Re: Add ".settings" to svn:ignore on root Nutch folder?

2006-04-07 Thread Jérôme Charron
> An end-to-end unit test would help coverage a lot: something that > performs a simple crawl and runs queries against it. Ideally this would > start an in-process web server serving test content, crawl that content, > then start a web server serving queries. This could be run in both > local and

Re: web ui improvement

2006-04-07 Thread Doug Cutting
Sami Siren wrote: I know there are people who think that a plain xml interface is good enough for all but I would like to give this new architecture a try. I think this would be a great addition. The XML has a lot of uses, but we should include a good native, extensible, skinnable search UI.

web ui improvement

2006-04-07 Thread Sami Siren
I have recently been working with refactoring the web gui to be a more extendable and manageable by replacing the spaghetti jsp ui with and ui layer done with struts and tiles. By doing so it will be much more easy to provide for example a plugin(extension) that will just change the layout of t

Re: PMD integration

2006-04-07 Thread Jérôme Charron
> 1) Should we check test sources with pmd? I don't think so. 2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar > things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far > as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one > oro jar than.

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
Committed. One can run the pmd checks by 'ant pmd'. It produces file with html report in build directory. It covers core nutch and plugins. Currently it uses unusedcode ruleset checks only but one can uncomment other rulesets in build.xml (or add another ones according to pmd documentation).

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Andrzej Bialecki
Chris Mattmann wrote: opinion, I recently downloaded and reviewed NUTCH-61, and really like the patch. +1 on my end. I haven't tried out NUTCH-240 yet, but it seems to be a logical extension point for Nutch to be able to plug in different scoring components. So, +1 from me. Thanks for lookin

[jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Summary: DTD Schemas for plugin.xml configuration files in conf directory (was: XML Schemas for xml configuration files in conf directory) > DTD Schema

[jira] Created: (NUTCH-245) XML Schemas for xml configuration files in conf directory

2006-04-07 Thread Chris A. Mattmann (JIRA)
XML Schemas for xml configuration files in conf directory - Key: NUTCH-245 URL: http://issues.apache.org/jira/browse/NUTCH-245 Project: Nutch Type: New Feature Components: fetcher, indexer, ndfs, searcher, we

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Jérôme Charron
> Do you guys have any additional insights / suggestions whether NUTCH-240 > and/or NUTCH-61 should be included in this release? NUTCH-240 : I really like the idea, but for now, I agree with that is API is still "ugly". I would like to help in the next weeks... So for me it should not be included

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Chris Mattmann
Hi Andrzej, On 4/7/06 12:18 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote: > Do you guys have any additional insights / suggestions whether NUTCH-240 > and/or NUTCH-61 should be included in this release? Looking at the JIRA popular issues pane for Nutch ( http://issues.apache.org/jira/browse

Re: [Proposal] New Lucene sub-project

2006-04-07 Thread Rida Benjelloun
Hi Jérôme, I found your idea very interesting. I will be interested to contribute to the Parse Plugins Framework. I have developed similar one using Lucene. The project name is Lius. If you are interested please let me know. On 4/7/06, Jérôme Charron <[EMAIL PROTECTED]> wrote: > > Hi all, > >

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Andrzej Bialecki
Doug Cutting wrote: Chris Mattmann wrote: +1 for a release sooner rather than later. I think this is a good plan. There's no reason we can't do another release in a month. If it is back-compatbible we can call it 0.8.x and if it's incompatible we can call it 0.9.0. I'm going to make a Ha

Re: Patch to remove Nutch formating from logs

2006-04-07 Thread Piotr Kosiorowski
Hello Christopher, I personally do not like combining logging with severe error handling but it is one of the features of Nutch for some time and I do not think it causes infinite loops in normal installations. Changing it as we are preparing to release a new version is not a good idea in my op

Re: Patch to remove Nutch formating from logs

2006-04-07 Thread Christopher Burkey
Did anyone get this email? Can a commiter acknowledge this has been received? We are have been having problems with infinite loops caused by Nutch. My theory is that the problem is related to using the log API to track severe errors. This patch is a only a few lines of code and should be eas

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
Doug Cutting wrote: So we start out comitting it as an independent target, and then add it to the "test" target? Is that the plan? If so, +1. Exactly - I will do it over the weekend. P.

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Piotr Kosiorowski
Doug Cutting wrote: Piotr, would you like to make this release, or should I? I would prefer you would do it this time - I am not sure if I can find some time next week. I would like to do some things before release though: 1) Commit clustering patch from Dawid (I took it over from Andrzej). 2

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Chris Mattmann
+1 On 4/7/06 10:20 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> +1 for a release sooner rather than later. > > I think this is a good plan. There's no reason we can't do another > release in a month. If it is back-compatbible we can call it 0.8.x and > if it's inco

Re: Add ".settings" to svn:ignore on root Nutch folder?

2006-04-07 Thread Doug Cutting
Jérôme Charron wrote: I absolutely agree Dawid. But I don't think Nutch has enought human resources to have a Q&A person. I will make a try to integrate a code coverage tool, and see if it gives us some good indices on unit tests needed efforts. I think more unit tests could go a long way towar

Re: PMD integration

2006-04-07 Thread Doug Cutting
Piotr Kosiorowski wrote: I will make it totally separate target (so test do not depend on it). That was actually Doug's idea (and I agree with it) to stop the build file if PMD complains about something. It's similar to testing -- if your tests fail, the entire build file fails. I totally agr

Re: CrawlDbReducer - selecting data for DB update

2006-04-07 Thread Doug Cutting
Andrzej Bialecki wrote: This selection is primarily made in the while() loop in CrawlDbReducer:45. My main objection is that selecting the "highest" value (meaning "most recent") relies on the fact that values of status codes in CrawlDatum are ordered according to their meaning, and they are t

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Doug Cutting
Chris Mattmann wrote: +1 for a release sooner rather than later. I think this is a good plan. There's no reason we can't do another release in a month. If it is back-compatbible we can call it 0.8.x and if it's incompatible we can call it 0.9.0. I'm going to make a Hadoop 0.1.1 release to

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
> > > > I will make it totally separate target (so test do not > > depend on it). > > That was actually Doug's idea (and I agree with it) to stop the build > file if PMD complains about something. It's similar to testing -- if > your tests fail, the entire build file fails. > I totally agree with i

Re: PMD integration

2006-04-07 Thread Dawid Weiss
I do agree with Jarome - plugins should be checked too. This basically means modifying the fileset in the pmd task. Shouldn't be too difficult to include all plugin sources with a single statement. I will make it totally separate target (so test do not depend on it). That was actually

CrawlDbReducer - selecting data for DB update

2006-04-07 Thread Andrzej Bialecki
Hi, The more I look at CrawlDbReducer the less I like the method it uses to select the most recent records. This selection is primarily made in the while() loop in CrawlDbReducer:45. My main objection is that selecting the "highest" value (meaning "most recent") relies on the fact that value

[jira] Updated: (NUTCH-240) Scoring API: extension point, scoring filters and an OPIC plugin

2006-04-07 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ] Andrzej Bialecki updated NUTCH-240: Attachment: patch2.txt Minor refactoring: passScore* methods now allow access to more data. I found this useful when implementing a different scoring pl

Entity �

2006-04-07 Thread marcel . schnippe
Hi, org.apache.nutch.html.Enities.encode has a little annoying bug, producing wrong markup. The generated entity � is illegal. see: > http://www.w3.org/International/questions/qa-controls > The NUL (Null) control is illegal and cannot be represented by NCR or encoded directly in markup langu

Re: PMD integration

2006-04-07 Thread Jérôme Charron
> I will make it totally separate target (so test do not > depend on it). +1 > The goal is to allow other developers to play with pmd easily but at the > same time I do not want the build to be affected. +1 > I would like also to look at possibility to generate crossreferenced HTML > code fro

[Proposal] New Lucene sub-project

2006-04-07 Thread Jérôme Charron
Hi all, While chatting with Chris Mattmann, it seems to be evident to us that there is a need for a new sub-project within Lucene. For now, Lucene's sub-projects used in Nutch are : 1. Lucene-java - The basis for search technology 2. Hadoop - The distributed computing platform 3. Nutch - The sear

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
I do agree with Jarome - plugins should be checked too. I would like to integrate PMD for core and plugins over the weekend based on the Dawid's work - I will make it totally separate target (so test do not depend on it). The goal is to allow other developers to play with pmd easily but at the sam

Re: Add ".settings" to svn:ignore on root Nutch folder?

2006-04-07 Thread Jérôme Charron
> > My feeling was simply that the closest we are to Nutch-1.0, the more be > need > > some Q&A metrics (for us and for nutch users). No? > I absolutely agree Jérôme, really. It's just that developers usually > tend to hook up dozens of Q&A plugins and never look at what they output > (that's the u

Re: PMD integration

2006-04-07 Thread Jérôme Charron
> > that right now it is checking only main code (without plugins?). > Yes, that's correct -- I forgot to mention that. PMD target is hooked up > with tests and stops the build if something fails. I thought the core > code should be this strict; for plugins we can have more relaxed rules -1 Since

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Andrzej Bialecki
Dawid Weiss wrote: Could we have the clustering patch applied before the 0.8.0 release? I know you're way busy with other things, Andrzej, maybe you'll forward it to somebody else? It shouldn't be a difficult patch to review and apply. No problem, I will take care of it before the release.

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Dawid Weiss
Could we have the clustering patch applied before the 0.8.0 release? I know you're way busy with other things, Andrzej, maybe you'll forward it to somebody else? It shouldn't be a difficult patch to review and apply. D. Doug Cutting wrote: TDLN wrote: I mean, how do others keep uptodate wi

Re: Add ".settings" to svn:ignore on root Nutch folder?

2006-04-07 Thread Dawid Weiss
My feeling was simply that the closest we are to Nutch-1.0, the more be need some Q&A metrics (for us and for nutch users). No? I absolutely agree Jérôme, really. It's just that developers usually tend to hook up dozens of Q&A plugins and never look at what they output (that's the usual scen

Re: PMD integration

2006-04-07 Thread Dawid Weiss
Hi Piotr, > that right now it is checking only main code (without plugins?). Yes, that's correct -- I forgot to mention that. PMD target is hooked up with tests and stops the build if something fails. I thought the core code should be this strict; for plugins we can have more relaxed rules (