Scott Green wrote:
> Hi list,
>
> Firstly, i don't know whether nutch-dev mail list is suitable for this
> topic or not. If I post in the wrong place, pls tell me where should I
> ask this question. Thanks.
>
> The question is how to index resource in real time in nutch? This
> question is raised from GMail. I don't know what exactly behind GMail,
> but it should be built on GFS. When I get one email or send one email
> out,  push the "Search Mail" immediately and it always get it. I'll
> appreciate if someone will to explain how GMail works.
>
> And any advice to hack Nutch/Hadoop to archive this? Thanks
>
hi,
Most of the projects in google uses a scalable data structure called 
bigtable. Orkut, google earth, finance and writley is reported to use 
this. And i suppose Gmail also uses bigtable. Bigtable is build upon GFS 
and desined to scale at petabyte lavel, but they work to icrease it to 
the next level.

As far as i know, you should rebuild the index every time or merge the 
indexes, so there is not an online index building. Consider asking this 
to lucene mailing list.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to