hi dennis: "Nutch's original intention was as a large-scale www search engine. " I am very agreeing with you! Dennis! nutch's goal is specificly that achives the goal like google to process the large-scale datas! There is no doubt that nutch will be a www search engine absolutely,but absolutely not a vertical search !
I am confident that hadoop can process the large datas of the www search engine! But lucene? I am afraid of the limited size of lucene's index per server is very little ,10G? or 30G? this is not enough for the www search engine! IMO, this is a bottleneck! how many pages do visvo search currently? 100 millions? or 1000 millions? IMO ,it will be very good that moving Nutch to a top level apache project out from under the Lucene umbrella ! but all the sub-projects of nutch should be active enough, if not, nutch's develop will be slow and it is no good for nutch's unity. So the number of the sub-projects should be less ! and the sub-projects should be active ,efficient and also strong enough ! Good luck ! Dennis Kubes-2 wrote: > > With the release of Nutch 1.0 I think it is a good time to begin a > discussion about the future of Nutch. Here are some things to consider > and would love to here everyones views on this > > Nutch's original intention was as a large-scale www search engine. That > is a very specific goal. Only a few people and organizations actually > use it on that level. (I just happen to be one of them as most of my > work focuses on large scale web search as opposed to vertical search). > Many, perhaps most, people using Nutch these days are either using parts > of Nutch, such as the crawler, or are targeting towards vertical or > intranet type search engines. This can be seen in how many people have > already started using the Solr integration features. So while Nutch was > originally intended as a www search, IMO most people aren't using it for > that purpose. > > Since there are different purposes for different users, would it be good > to consider moving Nutch to a top level apache project out from under > the Lucene umbrella? This would then allow the creation of nutch > sub-projects, such as nutch-solr, nutch-hbase. Thoughts? > > Many parts of Nutch have also been implemented in other projects. For > example, Tika for the parsers, Droids for the Crawler. In begs the > question what is Nutch's core features going forward. When I think > about search (again my perspective is large scale), I think crawling or > acquisition of data, parsing, analysis, indexing, deployment, and > searching. I personally think that there is much room for improvement > in crawling and especially analysis. Nutch shouldn't just be about the > shell but also the brains. > > And one of the biggest things I see is many newcomers to nutch have a > very hard time getting started. Part of this is understanding mapreduce > mentality, part is documentation, part is there is only so much time > some of us have to answer questions so some questions go unanswered on > the lists. How might this be improved going forward? > > Any other thoughts also welcome. Really I want to start a discussion > about where everyone thinks we are with the state of Nutch and its future. > > Dennis > > > -- View this message in context: http://www.nabble.com/The-Future-of-Nutch-tp22507507p22508747.html Sent from the Nutch - User mailing list archive at Nabble.com.