Hi nutch developers, I am new to nutch and lucene. Using nutch to do some heuristic testing for web search.
Could you verify the following. I guess right now, the way indices are organized is db - has the web graph segments - has the rest of the stuff like actual page content, anchor texts, title etc. (0) Is this correct? I need to add fields to the indices like "texts in bold" - ... "description keywords" ... (1) So, this will only affect the stuff in segments indices right and not the db index at all? (2) Also, could you point me to what is the name of the real algorithm used in lucene to find the score of a query wrt to the indices? I did take a look at Similarity classes. Looks like some sort of vector space model as there are funcs like queryNorm. Thanks. ____________________________________________________________________ Vikas Gupta Final Year Masters Student, http://www.cs.utexas.edu/users/vgupta Dept. of Computer Sciences, Univ. of Texas at Austin, USA ____________________________________________________________________ ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
