Re: Logic of score method in hits class

2004-07-27 Thread lingaraju
I did in the same way what you mentioned i mean divide all scores by the
first score and multiply by 100

Still I am not geeting exactly what I wanted.
I am searching for two words asia cup in the search
First three hits contains both words what i am searching for  but i got
percentages 100,69 and 33 respectively.

I am using

 String fields[] = new String[2];
 fields[0] = title;
 fields[1] = contents;
 Query q = MultiFieldQueryParser.parse(line,fields,analyzer);

 Hits hits = searcher.search(q);
 float sc = hits.score(i);

Thanks in advance
Raju






- Original Message - 
From: Doug Cutting [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, July 26, 2004 11:37 PM
Subject: Re: Logic of score method in hits class


 Lucene scores are not percentages.  They really only make sense compared
 to other scores for the same query.  If you like percentages, you can
 divide all scores by the first score and multiply by 100.

 Doug

 lingaraju wrote:
  Dear  All
 
  How the score method works(logic) in Hits class
  For 100% match also score is returning only 69%
 
  Thanks and regards
  Raju
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Time of last insert

2004-07-27 Thread lingaraju

Dear  All

How to know that, when(lastmodified time) last document is added to in index

Thanks and regards
Raju


Re: Boosting documents

2004-07-27 Thread Akmal Sarhan
Hallo,

I have followed your suggestion but I am not sure how it should be done
to achieve the following:
I want when I do the following search to have the score calculated so
that those with nr of kids higher get a better score and the less kids,
the less score , notice that I still want to get all documents

thanks for any input

import java.io.IOException;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.store.RAMDirectory;

public class TestMatching
{

protected float f;

public static void main(String[] args) throws IOException,
ParseException
{

RAMDirectory store = new RAMDirectory();
IndexWriter writer = new IndexWriter(store, new
SimpleAnalyzer(), true);

Field f1 = Field.Text(field, word);
Field kids1 = Field.Keyword(kids, 2);
Field kids2 = Field.Keyword(kids, 3);
Field kids3 = Field.Keyword(kids, 4);

Document d1 = new Document();
Document d2 = new Document();
Document d3 = new Document();

d1.add(f1);
d2.add(f1);
d3.add(f1);
d1.add(kids1);
d2.add(kids2);
d3.add(kids3);

d1.add(f1);
writer.addDocument(d1);
writer.addDocument(d2);
writer.addDocument(d3);

writer.optimize();
writer.close();

Searcher s = new IndexSearcher(store);

s.setSimilarity(new DefaultSimilarity() {

public float idf(Term term, Searcher searcher) throws
IOException
{
String string = term.text();
String string2 = term.field();
float f = 0.0f;
if (term.field().equals(kids))
{
// and now ??
} else
{
f = idf(searcher.docFreq(term), searcher.maxDoc());
}

return f;
}
});
Query query = QueryParser.parse(field:word kids:5, field,
new StandardAnalyzer());
Hits hits = s.search(query);

for (int i = 0; i  hits.length(); ++i)
{
Document doc = hits.doc(i);
System.out.println(i +   + hits.score(i));

}

}
}

Am Mo, den 26.07.2004 schrieb Doug Cutting um 20:14:
 Rob Clews wrote:
  I want to do the same, set a boost for a field containing a date that
  lowers as the date is further from now, is there any way I could do
  this?
 
 You could implement Similarity.idf(Term, Searcher) to, when 
 Term.field().equals(date), return a value that is greater for more 
 recent dates.
 
 Doug
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 !EXCUBATOR:41054a2d101985076154790!
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Caching of TermDocs

2004-07-27 Thread John Patterson
The caching by TermScorer of the next 32 Docs is a way to speed up the
serial (in order) reading of docs from the TermDocs object (probably coming
direct from disk).

I would like to hold a significant amount of the index in memory but use the
disk index as a spill over.  Obviously the best situation is to hold in
memory only the information that is likely to be used again soon.  It seems
that caching TermDocs would allow popular search terms to be searched more
efficiently while the less common terms would need to be read from disk.

Has anyone else done this?  Know of a better approach?

- Original Message - 
From: Paul Elschot [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, July 27, 2004 3:07 AM
Subject: Re: Caching of TermDocs


 On Monday 26 July 2004 21:41, John Patterson wrote:

  Is there any way to cache TermDocs?  Is this a good idea?

 Lucene does this internally by buffering
 up to 32 document numbers in advance for a query Term.
 You can view the details here in case you're interested:

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/java/org/apache/lucene/search/TermScorer.java
 It uses the TermDocs.read() method to fill a buffer of document numbers.

 Is this what you had in mind?

 Regards,
 Paul


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: updating the index created for database search

2004-07-27 Thread lingaraju
I tried but I am missing some thing
Please can you tell me the syntax how to use the TermQuery to check the
presence of document in index from key field say OID

- Original Message - 
From: Daniel Naber [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, July 26, 2004 5:21 PM
Subject: Re: updating the index created for database search


 On Monday 26 July 2004 13:31, lingaraju wrote:

  If it is new record  through which class we have to check that record is
  present in the index

 Just search for the id with a TermQuery. If you get a hit, the record is
in
 the index already.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Time of last insert

2004-07-27 Thread Erik Hatcher
On Jul 27, 2004, at 5:15 AM, Otis Gospodnetic wrote:
There is no API for that.
Yeah there is!  :)
IndexReader.lastModified()
I borrowed that from LIMO's .jsp page, by the way.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Caching of TermDocs

2004-07-27 Thread Doug Cutting
John Patterson wrote:
I would like to hold a significant amount of the index in memory but use the
disk index as a spill over.  Obviously the best situation is to hold in
memory only the information that is likely to be used again soon.  It seems
that caching TermDocs would allow popular search terms to be searched more
efficiently while the less common terms would need to be read from disk.
The operating system already caches recent disk i/o.  So what you'd save 
primarily would be the overhead of parsing the data.  However the parsed 
form, a sequence of docNo and freq ints, is nearly eight times as large 
as its compressed size in the index.  So your cache would consume a lot 
of memory.

Whether it this provide much overall speedup depends on the distribution 
of common terms in your query traffic.  If you have a few terms that are 
searched very frequently then it might pay off.  In my experience with 
general-purpose search engines this is not usually the case: folks seem 
to use rarer words in queries than they do in ordinary text.  But in 
some search applications perhaps the traffic is more skewed.  Only some 
experiments would tell for sure.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Time of last insert

2004-07-27 Thread lingaraju
But  that method is deprecated and Replaced by getCurrentVersion()

- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, July 27, 2004 6:25 PM
Subject: Re: Time of last insert


 
 On Jul 27, 2004, at 5:15 AM, Otis Gospodnetic wrote:
  There is no API for that.
 
 Yeah there is!  :)
 
 IndexReader.lastModified()
 
 I borrowed that from LIMO's .jsp page, by the way.
 
 Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



storing a directory (indexes) in a database [Our Ref:CPTB5FAD]

2004-07-27 Thread mchavda
hi there,

There seems to be lots of talk about storing Lucene directories in a 
relational DB, but I haven't found too many links.

I've looked at JDBCDirectory (http://ppinew.mnis.com/jdbcdirectory/) but 
get a NullPointerException when trying to get an index-reader on a newly 
created directory.

Has anyone else had success with JDBCDirectory?

Are there alternative implementation out there? 

thanks,
Manoj


This e-mail is intended exclusively for the addressee.
If you are not the addressee you must not read, copy,
use or disclose the e-mail nor the content; please notify
us immediately (by clicking Reply) and delete this e-mail.




RE: write lock: cleaning an index

2004-07-27 Thread Wu, Calvin
It looks like you didn't close the file handle properly.



-Original Message-
From: Ravi Rao [mailto:[EMAIL PROTECTED] 
Sent: Friday, July 23, 2004 12:57 PM
To: [EMAIL PROTECTED]
Subject: write lock: cleaning an index


All,

I have an application that has one IndexWriter.  Once in a while the
enclosing application is taken down with a kill and IndexWriter leaves a
lock file behind.  Other than removing the lock file, is there anything
else I can do to clean the index.

The only general solution I can think of is to index to a temporary
index and then every so often merge it with the master index, which
cannot be allowed to be corrupted.  In this scheme we lose only the
temporary index rather than the master index.

Many thanks,
-- 
Ravi/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Caching of TermDocs

2004-07-27 Thread John Patterson
Cool.  I'll give it a try.  Looks like extending FilterIndexReader is the
way to go.  Or possibly I could cache the compressed form at a lower level
getting the best of both worlds.  I'll look into both ways, profile the app,
and post my results.

- Original Message - 
From: Doug Cutting [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, July 27, 2004 8:33 PM
Subject: Re: Caching of TermDocs


 John Patterson wrote:
  I would like to hold a significant amount of the index in memory but use
the
  disk index as a spill over.  Obviously the best situation is to hold in
  memory only the information that is likely to be used again soon.  It
seems
  that caching TermDocs would allow popular search terms to be searched
more
  efficiently while the less common terms would need to be read from disk.

 The operating system already caches recent disk i/o.  So what you'd save
 primarily would be the overhead of parsing the data.  However the parsed
 form, a sequence of docNo and freq ints, is nearly eight times as large
 as its compressed size in the index.  So your cache would consume a lot
 of memory.

 Whether it this provide much overall speedup depends on the distribution
 of common terms in your query traffic.  If you have a few terms that are
 searched very frequently then it might pay off.  In my experience with
 general-purpose search engines this is not usually the case: folks seem
 to use rarer words in queries than they do in ordinary text.  But in
 some search applications perhaps the traffic is more skewed.  Only some
 experiments would tell for sure.

 Doug

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Phrase Query

2004-07-27 Thread Hetan Shah
Works for me.
Here is what I am striving to achieve.
phraseString =  request.getParameter(phrase);
if (phraseString.length()  0){
phraseQueryString = \+phraseString+(\);
phraseQuery = true;
queryString = phraseQueryString;
}
if(phraseQuery){
PhraseQuery pQuery = new PhraseQuery();
pQuery.add(new Term(contents, phraseString));
pQuery.setSlop(0);
QueryParser qP = new QueryParser();
query = qP.parse(phraseString);
}
This is piece of the code, what I intend to do is if there is any 
keyword entered in the Exact Phrase field of the form I want to use the 
phrase query other wise use regular Query.

Please correct the code if you'll think it is not correct. I am still 
learning about search and Lucene in general.

thanks.
-H
Erik Hatcher wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]