RE: compile search.jsp

2006-03-06 Thread Sylvain FURMANEK
Hi, You must modify directly the page search.jsp on the tomcat. -Message d'origine- De : Michael Ji [mailto:[EMAIL PROTECTED] Envoyé : dimanche 5 mars 2006 04:04 À : nutch-dev@lucene.apache.org Objet : compile search.jsp Hi, I made change in search.jsp under /nutch/src/web/jsp and

record termination and MapReduce

2006-03-06 Thread Toby DiPasquale
Hi all, I have a question about the MapReduce and NDFS implementations. When writing records into an NDFS file, how does one make sure that records terminate cleanly on block boundaries such that a Map job's input does not span multiple physical blocks? It also appears as if NDFS does not have

Nutch web site

2006-03-06 Thread Piotr Kosiorowski
Hi, It looks like Nutch web site was updated with site built from latest trunk - the only problem is it contains tutorial for unreleased (yet) version 0.8. I think we talked about it and agreed to keep tutorial for latest release on the Web. I have just updated site in svn (branch-0.7) with

Re: Nutch web site

2006-03-06 Thread Andrzej Bialecki
Piotr Kosiorowski wrote: Hi, It looks like Nutch web site was updated with site built from latest trunk - the only problem is it contains tutorial for unreleased (yet) version 0.8. I think we talked about it and agreed to keep tutorial for latest release on the Web. I have just updated site

Re: Nutch web site

2006-03-06 Thread Doug Cutting
Piotr Kosiorowski wrote: It looks like Nutch web site was updated with site built from latest trunk - the only problem is it contains tutorial for unreleased (yet) version 0.8. I think we talked about it and agreed to keep tutorial for latest release on the Web. I have just updated site in svn

[jira] Created: (NUTCH-224) Nutch doesn't handle Korean text at all

2006-03-06 Thread KuroSaka TeruHiko (JIRA)
Nutch doesn't handle Korean text at all --- Key: NUTCH-224 URL: http://issues.apache.org/jira/browse/NUTCH-224 Project: Nutch Type: Bug Components: indexer Versions: 0.7.1 Reporter: KuroSaka TeruHiko I was

Re: record termination and MapReduce

2006-03-06 Thread Doug Cutting
Toby DiPasquale wrote: I have a question about the MapReduce and NDFS implementations. When writing records into an NDFS file, how does one make sure that records terminate cleanly on block boundaries such that a Map job's input does not span multiple physical blocks? We do not currently

found resource parse-plugins.xm?

2006-03-06 Thread Stefan Groschupf
Hi, after a short time I already had 1602 time this lines in my tasktracker log files. 060307 022707 task_m_2bu9o4 found resource parse-plugins.xml at file:/home/joa/nutch/conf/parse-plugins.xml Sounds like this file is loaded 1602 (after lets say 3 minutes) I guess that wasn't the goal

Re: found resource parse-plugins.xm?

2006-03-06 Thread Stefan Groschupf
Hi Stack, :) yes! Until fetching with switched on parsing on one tasktracker that tries to crawl a 10 mio segment with 800 threads. :-? Stefan Am 07.03.2006 um 04:27 schrieb [EMAIL PROTECTED]: Stefan Groschupf wrote: Hi, after a short time I already had 1602 time this lines in my

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
Hi Stefan, after a short time I already had 1602 time this lines in my tasktracker log files. 060307 022707 task_m_2bu9o4 found resource parse-plugins.xml at file:/home/joa/nutch/conf/parse-plugins.xml Sounds like this file is loaded 1602 (after lets say 3 minutes) I guess that wasn't

Re: found resource parse-plugins.xm?

2006-03-06 Thread Stefan Groschupf
Hi Chris, thanks for the clarification. Do you think we can we somehow cache it in the nutchConf instance, since this is the way we doing this on other places as well? Cheers, Stefan Am 07.03.2006 um 04:38 schrieb Chris Mattmann: Hi Stefan, after a short time I already had 1602 time this

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
Hi Stefan, Hi Chris, thanks for the clarification. No probs. Do you think we can we somehow cache it in the nutchConf instance, since this is the way we doing this on other places as well? Yeah I think we can. Here is a small patch to the ParserFactory that should do the trick. Give it a

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
Sorry, My last patch was missing one line. Here's the update: Index: src/java/org/apache/nutch/parse/ParserFactory.java === --- src/java/org/apache/nutch/parse/ParserFactory.java (revision 383463) +++

Re: Nutch web site

2006-03-06 Thread Piotr Kosiorowski
Andrzej Bialecki wrote: +1, yes it would be really confusing. Since there are more and more people trying 0.8, could we perhaps include a short note that 0.8 and later is NOT compatible with this tutorial, and a reference to the tutorial for 0.8 (or the trunk/ branch in general)? I can

RE: Nutch web site

2006-03-06 Thread Richard Braman
No that sounds good to me. I also think that the whole web vs. crawl needs to be better explained. I will write a bug/patch for it tomorrow. -Original Message- From: Piotr Kosiorowski [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 1:13 AM To: nutch-dev@lucene.apache.org