On Thursday, November 24, 2005, at 04:03PM, Jérôme Charron <[EMAIL PROTECTED]> wrote:
>> Until last years there is one thing I notice that matters in a search >> engine - minimalism. > >If you are honnest Stefan, take a closer look at the end of the proposal >(here is a copy): >Issues > >Create performance benchmarks and ensure that the new implementation gives >at least the same performances as the parse-html plugin (the most used parse >plugin in a whole web crawling) > >Minimalism. >> Minimalism == speed, speed == scalability, > >speed == scalability ???? >Oh, damned, is it a new theory Stefan? > > >> scalability == serious > >high availability == serious (too) >monitoring == serious (too) >there is a lot of serious stuff you know, and I really think that >features == serious (too) > >I don't think it would be a good move to slow down html parsing (most >> used parser) to make rss parser writing more easier for developers. > >One more time: take a closer look at the proposal. The idea is to provide a >convenient >way to add some markup language related plugins (you know rss and atom are >the first steps to a more structured content... more is to come) >Not replacing the existing html and rss ones if their performance are >better. >Adapting the html and rss parsers to the proposal is just for archecture >"beauty" purposes, but it is not mandatory. >You know, actually, Nutch is widely used for thematic and intranet search >engines. And in such a context this proposal really makes sense (as in such >a context it makes sense to have a protocol-jdbc plugin for instance). > >From my perspective we have much more general things to solve in >> nutch (manageability, monitoring, ndfs block based task-routing, more >> dynamic search servers) than improving thing we already have. > >It's your point of view. >You know, I think there is something magic on nutch. It is that peoples are >focused on different subjects. >Some are more focused on infrastructure, some others on parsing, some others >on language technology... >That's a big chance for nutch... our complementarity... >(but it's true the subjects you mentionned are some very intersting >improvements, especially monitoring. Cannot be a serious product deployed on >many nodes if there is no way to monitor the whole system). > > >> Anyway as you may know we have a plugin system and one goal of the >> plugin system is to give developers the freedom to develop custom >> plugins. :-) > >Yes, since I have corrected many bugs in the plugin system (not yours I >hope), I clearly understand how it works, and what's its goal... > ;-) > >P.S. Do you think it makes sense to run another public nutch mailing >> list, since 'THE nutch [...]' (mailing list is nutch- >> [EMAIL PROTECTED]), 'Isn't it?' >> http://www.mail-archive.com/[email protected]/msg01513.html > >Is there another public nutch mailing list somewhere Stefan? >Please give me the address... > >Best Regards > >Jérôme > >-- >http://motrech.free.fr/ >http://www.frutch.org/ > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
