indeed, Ning made very nice job. This can also be used as an alternative to 
rsync (solr, nutch) scripts utilizing hadoop.


----- Original Message ----
> From: Andrzej Bialecki <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, 21 March, 2008 8:30:23 AM
> Subject: Re: Distributed Indexer?
> 
> [EMAIL PROTECTED] wrote:
> > Hi,
> > 
> > I see Andrzej is getting busy with JIRA.... new release in April?
> 
> That's the idea ... I could use some help, too ;)
> 
> > 
> > I just noticed something nice in Hadoop's svn repo - distributed
> > Lucene indexer.  Is anyone thinking about providing support for that
> > in Nutch?  Or do people think this is not needed because in the end
> > people tend to create a number of relatively small indices (5-10M
> > docs) as opposed to one larger index?
> 
> It caught my eye, too. I think it's nice that this tool uses low-level 
> knowledge of Lucene segments to minimize the index churn - however, if I 
> understand it correctly, the resulting indexes it maintains are still 
> located on HDFS. Also, this doesn't address the extended concept of 
> "shard" in Nutch, which consists not only of a Lucene index but also 
> contains binary content, parse data and parse text ...
> 
>  From our point of view it would be useful if it were to move two steps 
> further, i.e. include the management of other binary data (no longer so 
> trivial, eh?), and then offer a functionality to do this transparently 
> so that the shards end up on local filesystems of search servers, and 
> the low-level segment management is done there ...
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 




      ___________________________________________________________ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/

Reply via email to