Re: Lucene index file on HDFS

蒋明原 Wed, 30 Jun 2010 02:02:57 -0700

hi luo:

 
Nutch使用的确是Lucene索引，不过将索引放在HDFS上面是为了利用Hadoop平台的计算性能对索引进行合并等一些操作。在hadoop平台上进行这些操作比单机处理强很多。处理完成之后，可以将索引下载到本地进行访问，并不是提供搜索服务的时候也是在hdfs上面的。


使用mapfile sequencefile是为了利用Hadoop平台处理数据，最终生成索引。mapfile
sequencefile并不是索引存储的方式，里面存储的有原始数据，比如网页源码……（这点我说的只是大概意思，可以参考Hadoop权威指南关于mapfile
sequencefile的介绍，了解他们的特性）
On Wed, Jun 30, 2010 at 10:06 AM, 罗磊 <[email protected]> wrote:
> Hi all:
>
> I heard that Nutch put Lucene index file on HDFS, and wait for searcher. As
> far as I know, HDFS is not designed for low-latency visiting.
>
> So why Nutch put index file on HDFS? why not stored on local filesystem, and
> use normally RPC to call search function?
>
> I also heard that Nutch used MapFile a lot, do you think put those data on
> HBase is a good alternative?
>
> Thank you in advance
>

Re: Lucene index file on HDFS

Reply via email to