You can copy the index to local system, and configure Nutch to read from local one. By default Nutch will read from HDFS. RMI is not needed, because for Lucene, HDFS file is just another kind of inputstream. HDFS provide file operation APIs just like local file system.
On Wed, Jun 30, 2010 at 6:47 PM, 罗磊 <[email protected]> wrote: > Hi: > > Though I'm really not good at English, I still prefer English to let others > know what we are talking about. > > As [email protected] said, Nutch will copy the index file to native > filesystem. Could you tell me what technology Nutch use to search? Is RMI > or something else used? > > Thanks > > 2010/6/30 蒋明原 <[email protected]> > >> hi luo: >> >> >> >> Nutch使用的确是Lucene索引,不过将索引放在HDFS上面是为了利用Hadoop平台的计算性能对索引进行合并等一些操作。在hadoop平台上进行这些操作比单机处理强很多。处理完成之后,可以将索引下载到本地进行访问,并不是提供搜索服务的时候也是在hdfs上面的。 >> >> 使用mapfile sequencefile是为了利用Hadoop平台处理数据,最终生成索引。mapfile >> >> sequencefile并不是索引存储的方式,里面存储的有原始数据,比如网页源码......(这点我说的只是大概意思,可以参考Hadoop权威指南关于mapfile >> sequencefile的介绍,了解他们的特性) >> On Wed, Jun 30, 2010 at 10:06 AM, 罗磊 <[email protected]> wrote: >> > Hi all: >> > >> > I heard that Nutch put Lucene index file on HDFS, and wait for searcher. >> As >> > far as I know, HDFS is not designed for low-latency visiting. >> > >> > So why Nutch put index file on HDFS? why not stored on local filesystem, >> and >> > use normally RPC to call search function? >> > >> > I also heard that Nutch used MapFile a lot, do you think put those data >> on >> > HBase is a good alternative? >> > >> > Thank you in advance >> > >> >

