Re: GNU Getopt

2005-12-20 Thread Andrzej Bialecki
Andrew McNabb wrote: I'm on break right now, and I'm hoping to have a chance to get some stuff done. One thing I would like to do for Nutch is to use GNU Getopt (should be familiar for C coders out there) to make the command-line utilities behave properly. I'm especially thinking of NDFS.

[jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ] Jerome Charron updated NUTCH-139: - Attachment: NUTCH-139.jc.review.patch.txt Here is a new patch from Chris. I reviewed it, tested it. From my point of view, all seems to be ok. So if no

Re: nutch and google suggestion

2005-12-20 Thread Jack Tang
Thanks Stefan, I will try that:) /Jack On 12/20/05, Stefan Groschupf [EMAIL PROTECTED] wrote: This is straight forward. But I suggest using a sorted query log instead of the index as term source. Also you can rank the results after the position in the query log (sorted for query frequency).

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360901 ] Andrzej Bialecki commented on NUTCH-139: - I have an objection, in fact I think the patches miss the main point of using of prefixed property names. In this patch

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360902 ] Jerome Charron commented on NUTCH-139: -- Andrzej, Thanks for taking time to take a look at the patch. In fact, we have some discussion with Chris about this point (that's

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360906 ] Jerome Charron commented on NUTCH-139: -- Andrzej, Here are more comments about my doubts, and how to handle metadata names. if for instance a protocol plugin doesn't have

Static initializers

2005-12-20 Thread Andrzej Bialecki
Hi, This was mentioned before: there are many places in Nutch that rely on static initializers. This is so-so or sometimes plainly bad, depending on a situation. I'm facing a problem now with URLFilters. I need to run several fetchers inside a single VM, with different parameters such as

Re: Static initializers

2005-12-20 Thread Andrzej Bialecki
Andrzej Bialecki wrote: URLFilters: private URLFilters(NutchConf) { // initialize plugins based on this instance of NutchConf } public static URLFilters get(NutchConf conf) { URLFilters res = (URLFilters)conf.get(urlfilters.key); if (res == null) { res =

Re: Static initializers

2005-12-20 Thread Jérôme Charron
Andrzej, How do you choose the NutchConf to use ? Here is a short discussion I had with Doug about a kind of dynamic NutchConf inside the same JVM: ... By looking at the mailing lists archives it seems that having some behavior depending on the documents URL is a recurrent problem (for instance

Re: Static initializers

2005-12-20 Thread Andrzej Bialecki
Jérôme Charron wrote: Andrzej, How do you choose the NutchConf to use ? It is provided as an argument to all constructors. Here is a short discussion I had with Doug about a kind of dynamic NutchConf inside the same JVM: ... By looking at the mailing lists archives it seems that having

Re: Static initializers

2005-12-20 Thread Stefan Groschupf
Hi, right this is a know problem and discussed several times, we should start solving this. :-) I suggest that we make the Plugin Class implementing the Configurable interface. In case a plugin needs any configuration value it will request them from the plugin instance. The next step would

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360920 ] Jerome Charron commented on NUTCH-139: -- And why not using the fact that the ContentProperties object can now handles multi-valued properties. Each piece of code that

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ] Chris A. Mattmann commented on NUTCH-139: - Hi Andrzej, I have an objection, in fact I think the patches miss the main point of using of prefixed property names.

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ] Chris A. Mattmann commented on NUTCH-139: - Hmm, Okay, I just finished reading the rest of the comments :-) Sorry, just woke up out here in Los Angeles. Okay, I

Re: [Nutch-dev] distributed search

2005-12-20 Thread Andrzej Bialecki
Goldschmidt, Dave wrote: Hi Rafi, Not sure if anyone answered this, but I think you're just after the segslice command: $ nutch segslice If I understand the original request, that's only half of the answer, but the right half.. ;-) segslice doesn't slice the Lucene indexes, only the

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360933 ] Andrzej Bialecki commented on NUTCH-139: - I like Jerome's proposal of using the new ContentProperties class; this could save a lot of work, especially this naming

RE: [Nutch-dev] distributed search

2005-12-20 Thread Goldschmidt, Dave
Aha, thanks for the clarification! :-) The mergesegs command has a -i option to index the output segment. Perhaps the SegmentSlicer command could be modified to optionally index the output segments, too? New question: aside from slicing URLs by a Perl5 pattern, is there a way to slice an

[jira] Created: (NUTCH-146) mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml

2005-12-20 Thread Stefan Groschupf (JIRA)
mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml Key: NUTCH-146 URL: http://issues.apache.org/jira/browse/NUTCH-146 Project: Nutch Type: Bug Reporter: Stefan

RE: [Nutch-dev] distributed search

2005-12-20 Thread Ledio Ago
Thank you Dave, very helfull. -Ledio -Original Message- From: Goldschmidt, Dave [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 20, 2005 7:24 AM To: nutch-dev@lucene.apache.org Subject: RE: [Nutch-dev] distributed search Hi Rafi, Not sure if anyone answered this, but I think you're

[jira] Resolved: (NUTCH-146) mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml

2005-12-20 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-146?page=all ] Sami Siren resolved NUTCH-146: -- Fix Version: 0.8-dev Resolution: Fixed Assign To: Sami Siren mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml

[jira] Resolved: (NUTCH-145) build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM

2005-12-20 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ] Sami Siren resolved NUTCH-145: -- Fix Version: 0.8-dev Resolution: Fixed Assign To: Sami Siren this is now committed, thanks build of war file fails on Chinese (zh) .xml files due

nightly build

2005-12-20 Thread tigger .
Hi All The the nightly build is not working: bin/nutch admin db -create Exception in thread main java.lang.NoClassDefFoundError: admin nutch-2005-12-18.tar.gz 18-Dec-2005 00:54 50M Thanks

Re: [bug] overwriting job properties until runtime is not possible

2005-12-20 Thread Stefan Groschupf
Hi Paul, wouldn't it a better and may easier solution to have an arraylist for all values of keys and just add the values to the arraylist. Than we can have a getProperty method that return the first value in the list and a getProperties that return an array? This could be very similar to

Re: NDFS Connection reset

2005-12-20 Thread Paul Baclace
I have recently seen the connection reset problem, and no firewall was involved. I have been doing a mapred index build over more than 5TB of arc files and I noticed: SocketException: Connection reset that occurred in 1 of 1070 map tasks during the parse phase; the task was automatically