Re: Re: Searching against Database

2004-07-15 Thread Jones G
I don't have any best practices to offer. I have been using Lucene with MySQL for an 
year though.

All I do is store a key of some sort in the index
new Field(id, getPK(), true, false, false)

and then relate that to the database in code.

For Live Oracle databases, you might consider different things.

As I hear, Oracle lets you use Java in PL (no experience here). So you might consider 
to add some code into the triggers to add and delete documents from the index. But 
modifying the index is not as quick as modifying a database in most cases. So you 
might want to come up with some sort of a compromise on this.

Perhaps more experienced users in this list will have better insights.

Hope that helps.


On Thu, 15 Jul 2004 lingaraju wrote :
Hello

Even i am searching the same code as all my web display information is
stored  in database.
Early response will be very much helpful

Thanks and regards
Raju

- Original Message -
 From: Hetan Shah [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 5:56 AM
Subject: Searching against Database


  Hello All,
 
  I have got all the answers from this fantastic mailing list. I have
  another question ;)
 
  What is the best way (Best Practices) to integrate Lucene with live
  database, Oracle to be more specific. Any pointers are really very much
  appreciated.
 
  thanks guys.
  -H

Re: RE: Scoring without normalization!

2004-07-15 Thread Jones G
Sadly, I am still running into problems

Explain shows the following after the modification.

Rank: 1 ID: 11285358Score: 5.5740864E8
5.5740864E8 = product of:
  8.3611296E8 = sum of:
8.3611296E8 = product of:
  6.6889037E9 = weight(title:iron in 1235940), product of:
0.12621856 = queryWeight(title:iron), product of:
  7.0507255 = idf(docFreq=10816)
  0.017901499 = queryNorm
5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
  1.0 = tf(termFreq(title:iron)=1)
  7.0507255 = idf(docFreq=10816)
  7.5161928E9 = fieldNorm(field=title, doc=1235940)
  0.125 = coord(1/8)
2.7106019E-8 = product of:
  1.08424075E-7 = sum of:
5.7318403E-9 = weight(abstract:an in 1235940), product of:
  0.03711049 = queryWeight(abstract:an), product of:
2.073038 = idf(docFreq=1569960)
0.017901499 = queryNorm
  1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
1.0 = tf(termFreq(abstract:an)=1)
2.073038 = idf(docFreq=1569960)
7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
1.0269223E-7 = weight(abstract:iron in 1235940), product of:
  0.111071706 = queryWeight(abstract:iron), product of:
6.2046037 = idf(docFreq=25209)
0.017901499 = queryNorm
  9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
2.0 = tf(termFreq(abstract:iron)=4)
6.2046037 = idf(docFreq=25209)
7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
  0.25 = coord(2/8)
  0.667 = coord(2/3)
Rank: 2 ID: 8157438 Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 159395), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=159395)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
Rank: 3 ID: 10543103Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 553967), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 553967), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=553967)
0.125 = coord(1/8)
  0.3334 = coord(1/3)
Rank: 4 ID: 8753559 Score: 2.7870432E8
2.7870432E8 = product of:
  8.3611296E8 = product of:
6.6889037E9 = weight(title:iron in 2563152), product of:
  0.12621856 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
0.017901499 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=2563152)
0.125 = coord(1/8)
  0.3334 = coord(1/3)

I would like to get rid of all normalizations and just have TF and IDF.
What am I missing?


On Thu, 15 Jul 2004 Anson Lau wrote :
If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



 // Comment out the following
 //float scoreNorm = 1.0f;
 //if (length  0  scoreDocs[0].score  1.0f) {
 //  scoreNorm = 1.0f / scoreDocs[0].score;
 //}

 // And just set scoreNorm to 1.
 int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
 From: Jones G [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Re: Scoring without normalization!

2004-07-15 Thread Jones G
Thanks. I tried overriding Similarity, returning 1 in lengthNorm and queryNorm and 
setSimilarity on IndexSearcher with this.

Query: 1 Found: 1540632
Rank: 1 ID: 8157438 Score: 0.9994
3.73650457E11 = weight(title:iron in 159395), product of:
  7.0507255 = queryWeight(title:iron), product of:
7.0507255 = idf(docFreq=10816)
1.0 = queryNorm
  5.2994613E10 = fieldWeight(title:iron in 159395), product of:
1.0 = tf(termFreq(title:iron)=1)
7.0507255 = idf(docFreq=10816)
7.5161928E9 = fieldNorm(field=title, doc=159395)

How do I get rid of QueryWeight, fieldWeight, fieldNorm from the scoring?

I tried modifying TermQuery without much luck.


On Thu, 15 Jul 2004 Doug Cutting wrote :
Have you looked at:

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html

in particular, at:

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int)
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float)

Doug

Jones G wrote:
Sadly, I am still running into problems

Explain shows the following after the modification.

Rank: 1 ID: 11285358Score: 5.5740864E8
5.5740864E8 = product of:
   8.3611296E8 = sum of:
 8.3611296E8 = product of:
   6.6889037E9 = weight(title:iron in 1235940), product of:
 0.12621856 = queryWeight(title:iron), product of:
   7.0507255 = idf(docFreq=10816)
   0.017901499 = queryNorm
 5.2994613E10 = fieldWeight(title:iron in 1235940), product of:
   1.0 = tf(termFreq(title:iron)=1)
   7.0507255 = idf(docFreq=10816)
   7.5161928E9 = fieldNorm(field=title, doc=1235940)
   0.125 = coord(1/8)
 2.7106019E-8 = product of:
   1.08424075E-7 = sum of:
 5.7318403E-9 = weight(abstract:an in 1235940), product of:
   0.03711049 = queryWeight(abstract:an), product of:
 2.073038 = idf(docFreq=1569960)
 0.017901499 = queryNorm
   1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of:
 1.0 = tf(termFreq(abstract:an)=1)
 2.073038 = idf(docFreq=1569960)
 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
 1.0269223E-7 = weight(abstract:iron in 1235940), product of:
   0.111071706 = queryWeight(abstract:iron), product of:
 6.2046037 = idf(docFreq=25209)
 0.017901499 = queryNorm
   9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of:
 2.0 = tf(termFreq(abstract:iron)=4)
 6.2046037 = idf(docFreq=25209)
 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940)
   0.25 = coord(2/8)
   0.667 = coord(2/3)
Rank: 2 ID: 8157438 Score: 2.7870432E8
2.7870432E8 = product of:
   8.3611296E8 = product of:
 6.6889037E9 = weight(title:iron in 159395), product of:
   0.12621856 = queryWeight(title:iron), product of:
 7.0507255 = idf(docFreq=10816)
 0.017901499 = queryNorm
   5.2994613E10 = fieldWeight(title:iron in 159395), product of:
 1.0 = tf(termFreq(title:iron)=1)
 7.0507255 = idf(docFreq=10816)
 7.5161928E9 = fieldNorm(field=title, doc=159395)
 0.125 = coord(1/8)
   0.3334 = coord(1/3)
Rank: 3 ID: 10543103Score: 2.7870432E8
2.7870432E8 = product of:
   8.3611296E8 = product of:
 6.6889037E9 = weight(title:iron in 553967), product of:
   0.12621856 = queryWeight(title:iron), product of:
 7.0507255 = idf(docFreq=10816)
 0.017901499 = queryNorm
   5.2994613E10 = fieldWeight(title:iron in 553967), product of:
 1.0 = tf(termFreq(title:iron)=1)
 7.0507255 = idf(docFreq=10816)
 7.5161928E9 = fieldNorm(field=title, doc=553967)
 0.125 = coord(1/8)
   0.3334 = coord(1/3)
Rank: 4 ID: 8753559 Score: 2.7870432E8
2.7870432E8 = product of:
   8.3611296E8 = product of:
 6.6889037E9 = weight(title:iron in 2563152), product of:
   0.12621856 = queryWeight(title:iron), product of:
 7.0507255 = idf(docFreq=10816)
 0.017901499 = queryNorm
   5.2994613E10 = fieldWeight(title:iron in 2563152), product of:
 1.0 = tf(termFreq(title:iron)=1)
 7.0507255 = idf(docFreq=10816)
 7.5161928E9 = fieldNorm(field=title, doc=2563152)
 0.125 = coord(1/8)
   0.3334 = coord(1/3)

I would like to get rid of all normalizations and just have TF and IDF.
What am I missing?


On Thu, 15 Jul 2004 Anson Lau wrote :

If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



// Comment out the following
//float scoreNorm = 1.0f;
//if (length  0  scoreDocs[0].score  1.0f) {
//  scoreNorm = 1.0f / scoreDocs[0].score;
//}

// And just set scoreNorm to 1.
int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
 From: Jones G [mailto:[EMAIL PROTECTED]
Sent

Scoring without normalization!

2004-07-14 Thread Jones G
How do I remove document normalization from scoring in Lucene? I just want to stick to 
TF IDF.

Thanks.

Re: RE: Scoring without normalization!

2004-07-14 Thread Jones G
Thanks! Just what I wanted.

On Thu, 15 Jul 2004 Anson Lau wrote :
If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



 // Comment out the following
 //float scoreNorm = 1.0f;
 //if (length  0  scoreDocs[0].score  1.0f) {
 //  scoreNorm = 1.0f / scoreDocs[0].score;
 //}

 // And just set scoreNorm to 1.
 int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
 From: Jones G [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



One Field!

2004-07-14 Thread Jones G
I have an index with multiple fields. Right now I am using MultiFieldQueryParser to 
search the fields. This means that if the same term occurs in multiple fields, it will 
be weighed accordingly. Is there any way to treat all the fields in question as one 
field and score the document accordingly without having to reindex.

Thanks.