Re: Re: Searching against Database
I don't have any best practices to offer. I have been using Lucene with MySQL for an year though. All I do is store a key of some sort in the index new Field(id, getPK(), true, false, false) and then relate that to the database in code. For Live Oracle databases, you might consider different things. As I hear, Oracle lets you use Java in PL (no experience here). So you might consider to add some code into the triggers to add and delete documents from the index. But modifying the index is not as quick as modifying a database in most cases. So you might want to come up with some sort of a compromise on this. Perhaps more experienced users in this list will have better insights. Hope that helps. On Thu, 15 Jul 2004 lingaraju wrote : Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H
Re: RE: Scoring without normalization!
Sadly, I am still running into problems Explain shows the following after the modification. Rank: 1 ID: 11285358Score: 5.5740864E8 5.5740864E8 = product of: 8.3611296E8 = sum of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 1235940), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 1235940), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=1235940) 0.125 = coord(1/8) 2.7106019E-8 = product of: 1.08424075E-7 = sum of: 5.7318403E-9 = weight(abstract:an in 1235940), product of: 0.03711049 = queryWeight(abstract:an), product of: 2.073038 = idf(docFreq=1569960) 0.017901499 = queryNorm 1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of: 1.0 = tf(termFreq(abstract:an)=1) 2.073038 = idf(docFreq=1569960) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 1.0269223E-7 = weight(abstract:iron in 1235940), product of: 0.111071706 = queryWeight(abstract:iron), product of: 6.2046037 = idf(docFreq=25209) 0.017901499 = queryNorm 9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of: 2.0 = tf(termFreq(abstract:iron)=4) 6.2046037 = idf(docFreq=25209) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 0.25 = coord(2/8) 0.667 = coord(2/3) Rank: 2 ID: 8157438 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 159395), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 3 ID: 10543103Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 553967), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 553967), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=553967) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 4 ID: 8753559 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 2563152), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 2563152), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=2563152) 0.125 = coord(1/8) 0.3334 = coord(1/3) I would like to get rid of all normalizations and just have TF and IDF. What am I missing? On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:52 AM To: [EMAIL PROTECTED] Subject: Scoring without normalization! How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re: Scoring without normalization!
Thanks. I tried overriding Similarity, returning 1 in lengthNorm and queryNorm and setSimilarity on IndexSearcher with this. Query: 1 Found: 1540632 Rank: 1 ID: 8157438 Score: 0.9994 3.73650457E11 = weight(title:iron in 159395), product of: 7.0507255 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 1.0 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) How do I get rid of QueryWeight, fieldWeight, fieldNorm from the scoring? I tried modifying TermQuery without much luck. On Thu, 15 Jul 2004 Doug Cutting wrote : Have you looked at: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html in particular, at: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int) http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float) Doug Jones G wrote: Sadly, I am still running into problems Explain shows the following after the modification. Rank: 1 ID: 11285358Score: 5.5740864E8 5.5740864E8 = product of: 8.3611296E8 = sum of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 1235940), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 1235940), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=1235940) 0.125 = coord(1/8) 2.7106019E-8 = product of: 1.08424075E-7 = sum of: 5.7318403E-9 = weight(abstract:an in 1235940), product of: 0.03711049 = queryWeight(abstract:an), product of: 2.073038 = idf(docFreq=1569960) 0.017901499 = queryNorm 1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of: 1.0 = tf(termFreq(abstract:an)=1) 2.073038 = idf(docFreq=1569960) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 1.0269223E-7 = weight(abstract:iron in 1235940), product of: 0.111071706 = queryWeight(abstract:iron), product of: 6.2046037 = idf(docFreq=25209) 0.017901499 = queryNorm 9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of: 2.0 = tf(termFreq(abstract:iron)=4) 6.2046037 = idf(docFreq=25209) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 0.25 = coord(2/8) 0.667 = coord(2/3) Rank: 2 ID: 8157438 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 159395), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 3 ID: 10543103Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 553967), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 553967), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=553967) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 4 ID: 8753559 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 2563152), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 2563152), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=2563152) 0.125 = coord(1/8) 0.3334 = coord(1/3) I would like to get rid of all normalizations and just have TF and IDF. What am I missing? On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent
Scoring without normalization!
How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks.
Re: RE: Scoring without normalization!
Thanks! Just what I wanted. On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:52 AM To: [EMAIL PROTECTED] Subject: Scoring without normalization! How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
One Field!
I have an index with multiple fields. Right now I am using MultiFieldQueryParser to search the fields. This means that if the same term occurs in multiple fields, it will be weighed accordingly. Is there any way to treat all the fields in question as one field and score the document accordingly without having to reindex. Thanks.