indeed, Ning made very nice job. This can also be used as an alternative to rsync (solr, nutch) scripts utilizing hadoop.
----- Original Message ---- > From: Andrzej Bialecki <[EMAIL PROTECTED]> > To: [email protected] > Sent: Friday, 21 March, 2008 8:30:23 AM > Subject: Re: Distributed Indexer? > > [EMAIL PROTECTED] wrote: > > Hi, > > > > I see Andrzej is getting busy with JIRA.... new release in April? > > That's the idea ... I could use some help, too ;) > > > > > I just noticed something nice in Hadoop's svn repo - distributed > > Lucene indexer. Is anyone thinking about providing support for that > > in Nutch? Or do people think this is not needed because in the end > > people tend to create a number of relatively small indices (5-10M > > docs) as opposed to one larger index? > > It caught my eye, too. I think it's nice that this tool uses low-level > knowledge of Lucene segments to minimize the index churn - however, if I > understand it correctly, the resulting indexes it maintains are still > located on HDFS. Also, this doesn't address the extended concept of > "shard" in Nutch, which consists not only of a Lucene index but also > contains binary content, parse data and parse text ... > > From our point of view it would be useful if it were to move two steps > further, i.e. include the management of other binary data (no longer so > trivial, eh?), and then offer a functionality to do this transparently > so that the shards end up on local filesystems of search servers, and > the low-level segment management is done there ... > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > ___________________________________________________________ Rise to the challenge for Sport Relief with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/
