RE: ANT +BUILD + LUCENE

2004-09-14 Thread Gerard Sychay
Hi, I've used the following Ant targets for build scripts that required platform dependent work. In the example here, the property catalina.home is set according to what platform we're running on. You can adapt as needed. target name=platform description=Sets properties based on platform

Re: boost keywords

2004-08-12 Thread Gerard Sychay
Well, there is always the Lucene wiki. There's not a patterns page per se, but you could start one.. http://wiki.apache.org/jakarta-lucene Leos Literak [EMAIL PROTECTED] 08/12/04 02:02AM (It would be useful if there were lucene patterns page. E.g. if you wish to do A, then use B practice)

Re: Searching against Database

2004-07-19 Thread Gerard Sychay
You might run into problems with having too many Fields by treating each record as a Document and each column as a Field in that Document. An alternative would be to index each cell of the table as a Document and store and keep metadata (primary key, column name, table name, etc.) as stored,

Re: similarity of two texts - another question

2004-06-02 Thread Gerard Sychay
Hmm, the term vector does not have to consist of only term frequencies, does it? To give weight to rare terms, could you create a term vector of (TF*IDF) values for each term? Then, a distance function would measure how many terms two vectors have in common, giving weight to how many rare terms

RE: multivalue fields

2004-05-12 Thread Gerard Sychay
I don't know if it will help, but take a look at the following email and enclosing thread from a few weeks ago. http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=7737 Ryan Sonnek [EMAIL PROTECTED] 05/11/04 12:40PM using lucene 1.3-final, it appears to only search the first field with

Re: Mixing database and lucene searches

2004-05-11 Thread Gerard Sychay
Eric Jain [EMAIL PROTECTED] 05/11/04 04:47AM Hits hits = searcher.search(new TermQuery(text, foo) Set hitPKs = new Set(); for each doc in hits: hitPKs.put(doc.getField(pk)) Retrieving even one custom field for every document of a possibly large data set can end up being very

Re: Understanding Boolean Queries

2004-04-30 Thread Gerard Sychay
FWIW, I'll relate a general note from my brief experience. I try to structure the index to avoid the need for boolean queries as much as possible, in order to avoid issues like yours. For example, I was indexing dozens of columns from a database table. Each database row was a document, each

Re: Count for a keyword occurance in a file

2004-04-30 Thread Gerard Sychay
I had the same need recently. Specifically, I wanted the ability to display along with the results something like: - The query jra occurred 1000 times in 600 documents. For simple queries, the IndexReader.docFreq(Term) and IndexReader.termDocs(Term) methods are the way to go. But for like

Re: Adding duplicate Fields to Documents

2004-04-26 Thread Gerard Sychay
that are not tokenised, are stored separately. Someone more qualified can surely give you more details. You can look at your index with Luke, it might be insightful. sv On Thu, 22 Apr 2004, Gerard Sychay wrote: Hello, I am wondering what happens when you add two Fields with same names

Re: Adding duplicate Fields to Documents

2004-04-26 Thread Gerard Sychay
, keyword1) and (field_name, keyword2), using doc.get(field_name) always returns keyword2, the last value added. Of course, I can't really think of a scenario where this would be a problem.. Thanks for the help! Gerard Sychay 04/26/04 01:57PM Luke is a good idea. I'll also just write a simple

Adding duplicate Fields to Documents

2004-04-23 Thread Gerard Sychay
Hello, I am wondering what happens when you add two Fields with same names to a Document. The API states that if the fields are indexed, their text is treated as though appended. This much makes sense. But what about the following two cases: - Adding two fields with same name that are

Re: Does a RAMDirectory ever need to merge segments... (performanceissue)

2004-04-21 Thread Gerard Sychay
I've always wondered about this too. To put it another way, how does mergeFactor affect an IndexWriter backed by a RAMDirectory? Can I set mergeFactor to the highest possible value (given the machine's RAM) in order to avoid merging segments? Kevin A. Burton [EMAIL PROTECTED] 04/20/04 04:40AM