Hi: Though I'm really not good at English, I still prefer English to let others know what we are talking about.
As [email protected] said, Nutch will copy the index file to native filesystem. Could you tell me what technology Nutch use to search? Is RMI or something else used? Thanks 2010/6/30 蒋明原 <[email protected]> > hi luo: > > > > Nutch使用的确是Lucene索引,不过将索引放在HDFS上面是为了利用Hadoop平台的计算性能对索引进行合并等一些操作。在hadoop平台上进行这些操作比单机处理强很多。处理完成之后,可以将索引下载到本地进行访问,并不是提供搜索服务的时候也是在hdfs上面的。 > > 使用mapfile sequencefile是为了利用Hadoop平台处理数据,最终生成索引。mapfile > > sequencefile并不是索引存储的方式,里面存储的有原始数据,比如网页源码……(这点我说的只是大概意思,可以参考Hadoop权威指南关于mapfile > sequencefile的介绍,了解他们的特性) > On Wed, Jun 30, 2010 at 10:06 AM, 罗磊 <[email protected]> wrote: > > Hi all: > > > > I heard that Nutch put Lucene index file on HDFS, and wait for searcher. > As > > far as I know, HDFS is not designed for low-latency visiting. > > > > So why Nutch put index file on HDFS? why not stored on local filesystem, > and > > use normally RPC to call search function? > > > > I also heard that Nutch used MapFile a lot, do you think put those data > on > > HBase is a good alternative? > > > > Thank you in advance > > >

