As part of the required functionality of the 0.8 release discussion
on some other thread my opinion is to postbone any new ui functionality
(for example NUTCH-48) until the new architecture is in place
I would not veto someone testing & committing NUTCH-48. We should
avoid investing too mu
> An end-to-end unit test would help coverage a lot: something that
> performs a simple crawl and runs queries against it. Ideally this would
> start an in-process web server serving test content, crawl that content,
> then start a web server serving queries. This could be run in both
> local and
Sami Siren wrote:
I know there are people who think that a plain xml interface is good
enough for all but I would like to give this new architecture a try.
I think this would be a great addition. The XML has a lot of uses, but
we should include a good native, extensible, skinnable search UI.
I have recently been working with refactoring the web gui to be a more
extendable and manageable by replacing the spaghetti jsp ui with and ui
layer done with struts and tiles. By doing so it will be much more easy
to provide for example a plugin(extension) that will just change the
layout of t
> 1) Should we check test sources with pmd?
I don't think so.
2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar
> things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far
> as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one
> oro jar than.
Committed.
One can run the pmd checks by 'ant pmd'. It produces file with html
report in build directory. It covers core nutch and plugins.
Currently it uses unusedcode ruleset checks only but one can uncomment
other rulesets in build.xml (or add another ones according to pmd
documentation).
Chris Mattmann wrote:
opinion, I recently downloaded and reviewed NUTCH-61, and really like the
patch. +1 on my end. I haven't tried out NUTCH-240 yet, but it seems to be a
logical extension point for Nutch to be able to plug in different scoring
components. So, +1 from me.
Thanks for lookin
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ]
Chris A. Mattmann updated NUTCH-245:
Summary: DTD Schemas for plugin.xml configuration files in conf directory
(was: XML Schemas for xml configuration files in conf directory)
> DTD Schema
XML Schemas for xml configuration files in conf directory
-
Key: NUTCH-245
URL: http://issues.apache.org/jira/browse/NUTCH-245
Project: Nutch
Type: New Feature
Components: fetcher, indexer, ndfs, searcher, we
> Do you guys have any additional insights / suggestions whether NUTCH-240
> and/or NUTCH-61 should be included in this release?
NUTCH-240 : I really like the idea, but for now, I agree with that is API is
still "ugly". I would like to help in the next weeks...
So for me it should not be included
Hi Andrzej,
On 4/7/06 12:18 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
> Do you guys have any additional insights / suggestions whether NUTCH-240
> and/or NUTCH-61 should be included in this release?
Looking at the JIRA popular issues pane for Nutch (
http://issues.apache.org/jira/browse
Hi Jérôme,
I found your idea very interesting. I will be interested to contribute to
the Parse Plugins Framework. I have developed similar one using Lucene. The
project name is Lius.
If you are interested please let me know.
On 4/7/06, Jérôme Charron <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
>
Doug Cutting wrote:
Chris Mattmann wrote:
+1 for a release sooner rather than later.
I think this is a good plan. There's no reason we can't do another
release in a month. If it is back-compatbible we can call it 0.8.x
and if it's incompatible we can call it 0.9.0.
I'm going to make a Ha
Hello Christopher,
I personally do not like combining logging with severe error handling
but it is one of the features of Nutch for some time and I do not think
it causes infinite loops in normal installations. Changing it as we are
preparing to release a new version is not a good idea in my op
Did anyone get this email? Can a commiter acknowledge this has been
received?
We are have been having problems with infinite loops caused by Nutch. My
theory is that the problem is related to using the log API to track
severe errors. This patch is a only a few lines of code and should be
eas
Doug Cutting wrote:
So we start out comitting it as an independent target, and then add it
to the "test" target? Is that the plan? If so, +1.
Exactly - I will do it over the weekend.
P.
Doug Cutting wrote:
Piotr, would you like to make this release, or should I?
I would prefer you would do it this time - I am not sure if I can find
some time next week. I would like to do some things before release though:
1) Commit clustering patch from Dawid (I took it over from Andrzej).
2
+1
On 4/7/06 10:20 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:
> Chris Mattmann wrote:
>> +1 for a release sooner rather than later.
>
> I think this is a good plan. There's no reason we can't do another
> release in a month. If it is back-compatbible we can call it 0.8.x and
> if it's inco
Jérôme Charron wrote:
I absolutely agree Dawid. But I don't think Nutch has enought human
resources
to have a Q&A person.
I will make a try to integrate a code coverage tool, and see if it gives us
some good
indices on unit tests needed efforts.
I think more unit tests could go a long way towar
Piotr Kosiorowski wrote:
I will make it totally separate target (so test do not
depend on it).
That was actually Doug's idea (and I agree with it) to stop the build
file if PMD complains about something. It's similar to testing -- if
your tests fail, the entire build file fails.
I totally agr
Andrzej Bialecki wrote:
This selection is primarily made in the while() loop in
CrawlDbReducer:45. My main objection is that selecting the "highest"
value (meaning "most recent") relies on the fact that values of status
codes in CrawlDatum are ordered according to their meaning, and they are
t
Chris Mattmann wrote:
+1 for a release sooner rather than later.
I think this is a good plan. There's no reason we can't do another
release in a month. If it is back-compatbible we can call it 0.8.x and
if it's incompatible we can call it 0.9.0.
I'm going to make a Hadoop 0.1.1 release to
>
>
> > I will make it totally separate target (so test do not
> > depend on it).
>
> That was actually Doug's idea (and I agree with it) to stop the build
> file if PMD complains about something. It's similar to testing -- if
> your tests fail, the entire build file fails.
>
I totally agree with i
I do agree with Jarome - plugins should be checked too.
This basically means modifying the fileset in the pmd task. Shouldn't be
too difficult to include all plugin sources with a single
statement.
I will make it totally separate target (so test do not
depend on it).
That was actually
Hi,
The more I look at CrawlDbReducer the less I like the method it uses to
select the most recent records.
This selection is primarily made in the while() loop in
CrawlDbReducer:45. My main objection is that selecting the "highest"
value (meaning "most recent") relies on the fact that value
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki updated NUTCH-240:
Attachment: patch2.txt
Minor refactoring: passScore* methods now allow access to more data. I found
this useful when implementing a different scoring pl
Hi,
org.apache.nutch.html.Enities.encode has a little annoying bug, producing
wrong markup. The generated entity � is illegal.
see:
> http://www.w3.org/International/questions/qa-controls
> The NUL (Null) control is illegal and cannot be represented by NCR or
encoded directly in markup langu
> I will make it totally separate target (so test do not
> depend on it).
+1
> The goal is to allow other developers to play with pmd easily but at the
> same time I do not want the build to be affected.
+1
> I would like also to look at possibility to generate crossreferenced HTML
> code fro
Hi all,
While chatting with Chris Mattmann, it seems to be evident to us that there
is a need for a new sub-project within Lucene.
For now, Lucene's sub-projects used in Nutch are :
1. Lucene-java - The basis for search technology
2. Hadoop - The distributed computing platform
3. Nutch - The sear
I do agree with Jarome - plugins should be checked too.
I would like to integrate PMD for core and plugins over the weekend based on
the Dawid's work - I will make it totally separate target (so test do not
depend on it).
The goal is to allow other developers to play with pmd easily but at the
sam
> > My feeling was simply that the closest we are to Nutch-1.0, the more be
> need
> > some Q&A metrics (for us and for nutch users). No?
> I absolutely agree Jérôme, really. It's just that developers usually
> tend to hook up dozens of Q&A plugins and never look at what they output
> (that's the u
> > that right now it is checking only main code (without plugins?).
> Yes, that's correct -- I forgot to mention that. PMD target is hooked up
> with tests and stops the build if something fails. I thought the core
> code should be this strict; for plugins we can have more relaxed rules
-1
Since
Dawid Weiss wrote:
Could we have the clustering patch applied before the 0.8.0 release? I
know you're way busy with other things, Andrzej, maybe you'll forward
it to somebody else? It shouldn't be a difficult patch to review and
apply.
No problem, I will take care of it before the release.
Could we have the clustering patch applied before the 0.8.0 release? I
know you're way busy with other things, Andrzej, maybe you'll forward it
to somebody else? It shouldn't be a difficult patch to review and apply.
D.
Doug Cutting wrote:
TDLN wrote:
I mean, how do others keep uptodate wi
My feeling was simply that the closest we are to Nutch-1.0, the more be need
some Q&A metrics (for us and for nutch users). No?
I absolutely agree Jérôme, really. It's just that developers usually
tend to hook up dozens of Q&A plugins and never look at what they output
(that's the usual scen
Hi Piotr,
> that right now it is checking only main code (without plugins?).
Yes, that's correct -- I forgot to mention that. PMD target is hooked up
with tests and stops the build if something fails. I thought the core
code should be this strict; for plugins we can have more relaxed rules
(
36 matches
Mail list logo