Latest version of Mapred

2005-12-19 Thread Rafi Iz
Hi all, I am currently working with Nutch 0.7.1, I want to start using the mapred, any ideas where I can find the latest version. B.T.W I looked at the path: http://svn.apache.org/repos/asf/lucene/nutch/branches/ but the only directory that exists there is branch-0.7/ Thanks, Raffi

Re: Latest version of Mapred

2005-12-19 Thread Stefan Groschupf
mapred is now trunk... Am 19.12.2005 um 18:46 schrieb Rafi Iz: Hi all, I am currently working with Nutch 0.7.1, I want to start using the mapred, any ideas where I can find the latest version. B.T.W I looked at the path: http://svn.apache.org/repos/asf/lucene/ nutch/branches/ but the only

Re: problems http-client

2005-12-19 Thread Andrzej Bialecki
Stefan Groschupf wrote: Anyway today we note that when fetching with http-client the sum of errors and fetched pages is much less than the size defined when generating the segment. Changing to protocol-http solves the problem. Has anyone also note this behavior? I haven't, but this

Re: problems http-client

2005-12-19 Thread Stefan Groschupf
OK I will do that tomorrow! However in case it is known as buggy, we may should not set up as default http protocol plugin as it is by today. Newbies checking out nutch ill use the version that does not fetch all pages, since most people start with the standard configuration. Am 19.12.2005

Re: problems http-client

2005-12-19 Thread Michael
The same problem on FreeBSD 6.0 + jdk1.4.2 I think it was also reported some time ago by Rod Taylor. Switch to protocol-http. SG Hi there, SG is there someone out there that can confirm a problem we discovered? SG We was wondering why not all pages of a generated segments was SG fetched.

Re: problems http-client

2005-12-19 Thread Andrzej Bialecki
Stefan Groschupf wrote: OK I will do that tomorrow! However in case it is known as buggy, we may should not set up as default http protocol plugin as it is by today. Newbies checking out nutch ill use the version that does not fetch all pages, since most people start with the standard

Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-19 Thread Piotr Kosiorowski
+1 - especially for amount of support Stefan gives to nutch users. P. Andrzej Bialecki wrote: Hi, During the past year and more Stefan participated actively in the development, and contributed many high-quality patches. He's been spending considerable effort on addressing many issues in JIRA,

Re: Latest version of Mapred

2005-12-19 Thread Rafi Iz
Thanks for the fast response, Do you know where I can find a compressed version? Thanks, Rafi From: Stefan Groschupf [EMAIL PROTECTED] Reply-To: nutch-dev@lucene.apache.org To: nutch-dev@lucene.apache.org Subject: Re: Latest version of Mapred Date: Mon, 19 Dec 2005 19:00:29 +0100 mapred is

Re: Latest version of Mapred

2005-12-19 Thread Jérôme Charron
Thanks for the fast response, Do you know where I can find a compressed version? Here are the nightly builds: http://cvs.apache.org/dist/lucene/nutch/nightly/ Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

RE: [Nutch-dev] distributed search

2005-12-19 Thread Ledio Ago
I tried separating the Tomcat into a different machine and bingo. The performance went up by 30%%. Right now I only have two machines with 900K URLs each that act as Nutch servers and one machine that hosts the Tomcat. At this time I don't suspect any more that Tomcat is synchronously

Re: [Nutch-dev] distributed search

2005-12-19 Thread Stefan Groschupf
By the way, is there an easy way to split the index I have already have. I would hate to recrawl all of the 1.9MM URLs again and waste bandwidth. Well I do not know any tool that comes with nutch or a other tool that does it, may there is one. But to write a java class that creates two

Re: [Nutch-dev] distributed search

2005-12-19 Thread Rafi Iz
check the next command FetchListTool (-local | -ndfs namenode:port) db segment_dir [-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays] This command call to a function called emitMultipleLists which spit out several fetchlists, so that you can fetch

RE: [Nutch-dev] distributed search

2005-12-19 Thread Ledio Ago
I have the book so I'll check what I can do with the API. Thanks Stefan, Ledio -Original Message- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 3:38 PM To: nutch-dev@lucene.apache.org Subject: Re: [Nutch-dev] distributed search By the way, is there