Andrew McNabb wrote:
I'm on break right now, and I'm hoping to have a chance to get some
stuff done. One thing I would like to do for Nutch is to use GNU Getopt
(should be familiar for C coders out there) to make the command-line
utilities behave properly. I'm especially thinking of NDFS.
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ]
Jerome Charron updated NUTCH-139:
-
Attachment: NUTCH-139.jc.review.patch.txt
Here is a new patch from Chris. I reviewed it, tested it.
From my point of view, all seems to be ok.
So if no
Thanks Stefan, I will try that:)
/Jack
On 12/20/05, Stefan Groschupf [EMAIL PROTECTED] wrote:
This is straight forward.
But I suggest using a sorted query log instead of the index as term
source.
Also you can rank the results after the position in the query log
(sorted for query frequency).
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360901 ]
Andrzej Bialecki commented on NUTCH-139:
-
I have an objection, in fact I think the patches miss the main point of using
of prefixed property names.
In this patch
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360902 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Thanks for taking time to take a look at the patch.
In fact, we have some discussion with Chris about this point
(that's
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360906 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Here are more comments about my doubts, and how to handle metadata names.
if for instance a protocol plugin doesn't have
Hi,
This was mentioned before: there are many places in Nutch that rely on
static initializers. This is so-so or sometimes plainly bad, depending
on a situation.
I'm facing a problem now with URLFilters. I need to run several fetchers
inside a single VM, with different parameters such as
Andrzej Bialecki wrote:
URLFilters:
private URLFilters(NutchConf) {
// initialize plugins based on this instance of NutchConf
}
public static URLFilters get(NutchConf conf) {
URLFilters res = (URLFilters)conf.get(urlfilters.key);
if (res == null) {
res =
Andrzej,
How do you choose the NutchConf to use ?
Here is a short discussion I had with Doug about a kind of dynamic NutchConf
inside the same JVM:
... By looking at the mailing lists archives it seems that having some
behavior depending on the documents URL is a recurrent problem (for instance
Jérôme Charron wrote:
Andrzej,
How do you choose the NutchConf to use ?
It is provided as an argument to all constructors.
Here is a short discussion I had with Doug about a kind of dynamic NutchConf
inside the same JVM:
... By looking at the mailing lists archives it seems that having
Hi,
right this is a know problem and discussed several times, we should
start solving this. :-)
I suggest that we make the Plugin Class implementing the Configurable
interface. In case a plugin needs any configuration value it will
request them from the plugin instance.
The next step would
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360920 ]
Jerome Charron commented on NUTCH-139:
--
And why not using the fact that the ContentProperties object can now handles
multi-valued properties.
Each piece of code that
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Andrzej,
I have an objection, in fact I think the patches miss the main point of using
of prefixed property names.
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hmm,
Okay, I just finished reading the rest of the comments :-) Sorry, just woke up
out here in Los Angeles. Okay, I
Goldschmidt, Dave wrote:
Hi Rafi,
Not sure if anyone answered this, but I think you're just after the
segslice command:
$ nutch segslice
If I understand the original request, that's only half of the answer,
but the right half.. ;-)
segslice doesn't slice the Lucene indexes, only the
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360933 ]
Andrzej Bialecki commented on NUTCH-139:
-
I like Jerome's proposal of using the new ContentProperties class; this could
save a lot of work, especially this naming
Aha, thanks for the clarification! :-) The mergesegs command has a -i
option to index the output segment. Perhaps the SegmentSlicer command
could be modified to optionally index the output segments, too?
New question: aside from slicing URLs by a Perl5 pattern, is there a way
to slice an
mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml
Key: NUTCH-146
URL: http://issues.apache.org/jira/browse/NUTCH-146
Project: Nutch
Type: Bug
Reporter: Stefan
Thank you Dave, very helfull.
-Ledio
-Original Message-
From: Goldschmidt, Dave [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 20, 2005 7:24 AM
To: nutch-dev@lucene.apache.org
Subject: RE: [Nutch-dev] distributed search
Hi Rafi,
Not sure if anyone answered this, but I think you're
[ http://issues.apache.org/jira/browse/NUTCH-146?page=all ]
Sami Siren resolved NUTCH-146:
--
Fix Version: 0.8-dev
Resolution: Fixed
Assign To: Sami Siren
mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml
[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ]
Sami Siren resolved NUTCH-145:
--
Fix Version: 0.8-dev
Resolution: Fixed
Assign To: Sami Siren
this is now committed, thanks
build of war file fails on Chinese (zh) .xml files due
Hi All
The the nightly build is not working:
bin/nutch admin db -create
Exception in thread main java.lang.NoClassDefFoundError: admin
nutch-2005-12-18.tar.gz 18-Dec-2005 00:54 50M
Thanks
Hi Paul,
wouldn't it a better and may easier solution to have an arraylist for
all values of keys and just add the values to the arraylist.
Than we can have a getProperty method that return the first value in
the list and a getProperties that return an array? This could be very
similar to
I have recently seen the connection reset problem, and no firewall was involved.
I have been doing a mapred index build over more than 5TB of arc files and I
noticed:
SocketException: Connection reset
that occurred in 1 of 1070 map tasks during the parse phase; the task was
automatically
24 matches
Mail list logo