Hi nutch developers,

I am new to nutch and lucene. Using nutch to do some heuristic testing for
web search.

Could you verify the following.

I guess right now, the way indices are organized is

db - has the web graph

segments - has the rest of the stuff like actual page content, anchor
texts, title etc.

(0) Is this correct?

I need to add fields to the indices like

"texts in bold" - ...
"description keywords" ...

(1) So, this will only affect the stuff in segments indices right and not
the db index at all?

(2) Also, could you point me to what is the name of the real algorithm
used in lucene to find the score of a query wrt to the indices? I did take
a look at Similarity classes. Looks like some sort of vector space model
as there are funcs like queryNorm.

Thanks.


____________________________________________________________________
Vikas Gupta
Final Year Masters Student,   http://www.cs.utexas.edu/users/vgupta
Dept. of Computer Sciences,
Univ. of Texas at Austin, USA
____________________________________________________________________


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to