[
http://issues.apache.org/jira/browse/NUTCH-227?page=comments#action_12369660 ]
Andrzej Bialecki commented on NUTCH-227:
-
Isn't it so that QueryFilter (which is an interface) already extends
Configurable? What seems to be missing in
[
http://issues.apache.org/jira/browse/NUTCH-227?page=comments#action_12369665 ]
Marko Bauhardt commented on NUTCH-227:
--
take a look to Extension.java line: 151 to 154.
Object object = extensionClazz.newInstance();
if(object instanceof
[ http://issues.apache.org/jira/browse/NUTCH-227?page=all ]
Jerome Charron closed NUTCH-227:
Resolution: Fixed
Oups.. sorry guys... and thanks for you prompt remarks.
All is in fact OK.
Basic Query Filter no more uses Configuration
In fact, my first need was to be able to configure the boost for
RawFieldQueryFilter.
The idea is then to give to the user a better control of boost values by
simply :
* add a setBoost(float) method to RawFieldQueryFilter.
* (add a setLowerCase(boolean) method to RawFieldQueryFilter)
* Add some
Jérôme,
+1
Having the chance to write query filters that allows more control in
general would be very helpful.
Stefan
Am 09.03.2006 um 18:35 schrieb Jérôme Charron:
In fact, my first need was to be able to configure the boost for
RawFieldQueryFilter.
The idea is then to give to the user a
Actually there is a property in conf: generate.max.per.host
So if you add a message in Generator.java at the appropriate place... you
have what you wish...
Gal
-Original Message-
From: Rod Taylor [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 08, 2006 7:28 PM
To: Nutch Developer
Hello,
I was wondering, if any one is willing to consider some changes to make
nutch more user friendly..
like to get a general feeling of the code base, reviewing code and cleaning
up shadow variables, etc.,
Is some one doing it already ? I am willing to take some time to contribute.
Are there
On Thu, 2006-03-09 at 21:51 +0200, Gal Nitzan wrote:
Actually there is a property in conf: generate.max.per.host
That has proven to be problematic.
foo.domain.com
bar.domain.com
baz.domain.com
*** Repeat up to 4 Million times for some content generator sites ***
Each of these gets a different
Rod Taylor wrote:
First is to allow for cleaning up. This consists of a new option to
updatedb which can scrub the database of all URLs which no longer
match URLFilter settings (regex-urlfilter.txt). This allows a change in
the urlfilter to be reflected against Nutches current dataset,
Hi,
I have updated site in 0.7 branch with latest trunk changes. I have
added both tutorials to the site so people will be aware of differences.
I have also committed DOAP file in 0.7 branch.
Nutch Website uses branch-0.7 now.
Piotr
Hello,
I would like to release nutch 0.7.2 in a week or two. Some serious
bugfixes are already covered and I have a plan to fix one or two more.
I found an email from Doug with title [Fwd: Crawler submits forms?]
stating: This has been fixed in the mapred branch, but that patch is
not in
[ http://issues.apache.org/jira/browse/NUTCH-225?page=all ]
Piotr Kosiorowski closed NUTCH-225:
---
Resolution: Won't Fix
I have just updated Nutch Web site. It contains now both tutorials (for 0.7 and
0.8).
I have also added a notr to each
Upps, sorry for ignoring this discussion - i was looking for comments in
JIRA and already committed the change before reading your discussion.
My motivation is to have usable version of tutorial - as simple as it is
possible to be versioned with the sources - only for historical purposes
- if
First is to allow for cleaning up. This consists of a new option to
updatedb which can scrub the database of all URLs which no longer
match URLFilter settings (regex-urlfilter.txt). This allows a change in
the urlfilter to be reflected against Nutches current dataset, something
I think
[ http://issues.apache.org/jira/browse/NUTCH-91?page=all ]
Piotr Kosiorowski closed NUTCH-91:
--
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
Commited with small extension. Thanks.
empty encoding causes exception
+1
If we go with that idea, then the one on the website should be the
tutorial for the latest release with a link to the wiki for the dev version of
the tutorial and a note explaining that tutorials for older versions come with
the source.
Jake.
-Original Message-
From:
Rod Taylor wrote:
Doing the actual expunging during updatedb is better than as a separate
command for performance. As a periodic option (scrubbing content
generation or abuse sites in my case) combining with updatedb will
reduce the IO and CPU requirements. Updatedb already reads in the DB,
I'm still on 0.7*, and would welcome a new release.
Otis
- Original Message
From: Piotr Kosiorowski [EMAIL PROTECTED]
To: nutch-dev@lucene.apache.org
Sent: Thursday, March 9, 2006 3:31:09 PM
Subject: Nutch 0.7.2
Hello,
I would like to release nutch 0.7.2 in a week or two. Some serious
Piotr Kosiorowski wrote:
I found an email from Doug with title [Fwd: Crawler submits forms?]
stating: This has been fixed in the mapred branch, but that patch is
not in 0.7.1. This alone might be a reason to make a 0.7.2 release.
I just want to make sure it was fixed by svn commit: r348533
19 matches
Mail list logo