Hi! I want to get more insight into various search engine algorithms. I have wide knowledge of standard data structures & algorithms (hashvalues, trees, graphs, etc.). I thought that Lucene would be good place to start to seek for information and indeed I've found some decent information at Nutch website. However, I decided to post here some personal opinions regarding this issue thinking that someone might give me even more information.
As far as I understand I should read books about Informational Retrieval (i.e. Modern Information Retrieval by Balza-Yates, Ribero-Neto). Any update? I also found using one article about link spam and citeseer wide articles about link spam techniques, namely: 1. Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings 2. Using Rank Propagation and Probabilistic Counting for LinkBased Spam Detection 3. SpamRank Fully Automatic Link Spam Detection 4. Identifying Link Farm Spam Pages 5. Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam If you have some more opinions about valuable literature about search engine algorithms (primary books but also nice articles might work, let me know). Thanks and keep on good work. -- Mladen Adamovic http://www.online-utility.org http://www.cheapvps.info http://www.vpsreview.com http://www.vpsdeal.com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
