Scott Green wrote: > Hi list, > > Firstly, i don't know whether nutch-dev mail list is suitable for this > topic or not. If I post in the wrong place, pls tell me where should I > ask this question. Thanks. > > The question is how to index resource in real time in nutch? This > question is raised from GMail. I don't know what exactly behind GMail, > but it should be built on GFS. When I get one email or send one email > out, push the "Search Mail" immediately and it always get it. I'll > appreciate if someone will to explain how GMail works. > > And any advice to hack Nutch/Hadoop to archive this? Thanks > hi, Most of the projects in google uses a scalable data structure called bigtable. Orkut, google earth, finance and writley is reported to use this. And i suppose Gmail also uses bigtable. Bigtable is build upon GFS and desined to scale at petabyte lavel, but they work to icrease it to the next level.
As far as i know, you should rebuild the index every time or merge the indexes, so there is not an online index building. Consider asking this to lucene mailing list. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers