Hi,
I've used the following Ant targets for build scripts that required
platform dependent work. In the example here, the property
catalina.home is set according to what platform we're running on. You
can adapt as needed.
target name=platform description=Sets properties based on
platform
Well, there is always the Lucene wiki. There's not a patterns page per
se, but you could start one..
http://wiki.apache.org/jakarta-lucene
Leos Literak [EMAIL PROTECTED] 08/12/04 02:02AM
(It would be useful if there were lucene patterns
page. E.g. if you wish to do A, then use B practice)
You might run into problems with having too many Fields by treating each
record as a Document and each column as a Field in that Document. An
alternative would be to index each cell of the table as a Document and
store and keep metadata (primary key, column name, table name, etc.) as
stored,
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term? Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms
I don't know if it will help, but take a look at the following email and
enclosing thread from a few weeks ago.
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=7737
Ryan Sonnek [EMAIL PROTECTED] 05/11/04 12:40PM
using lucene 1.3-final, it appears to only search the first field with
Eric Jain [EMAIL PROTECTED] 05/11/04 04:47AM
Hits hits = searcher.search(new TermQuery(text, foo)
Set hitPKs = new Set();
for each doc in hits:
hitPKs.put(doc.getField(pk))
Retrieving even one custom field for every document of a possibly
large
data set
can end up being very
FWIW, I'll relate a general note from my brief experience. I try to
structure the index to avoid the need for boolean queries as much as
possible, in order to avoid issues like yours.
For example, I was indexing dozens of columns from a database table.
Each database row was a document, each
I had the same need recently. Specifically, I wanted the ability to
display along with the results something like:
- The query jra occurred 1000 times in 600 documents.
For simple queries, the IndexReader.docFreq(Term) and
IndexReader.termDocs(Term) methods are the way to go. But for like
that are not
tokenised, are stored separately. Someone more qualified can surely
give
you more details.
You can look at your index with Luke, it might be insightful.
sv
On Thu, 22 Apr 2004, Gerard Sychay wrote:
Hello,
I am wondering what happens when you add two Fields with same names
, keyword1) and (field_name,
keyword2), using doc.get(field_name) always returns keyword2, the
last value added. Of course, I can't really think of a scenario where
this would be a problem..
Thanks for the help!
Gerard Sychay 04/26/04 01:57PM
Luke is a good idea. I'll also just write a simple
Hello,
I am wondering what happens when you add two Fields with same names to
a Document. The API states that if the fields are indexed, their text
is treated as though appended. This much makes sense. But what about
the following two cases:
- Adding two fields with same name that are
I've always wondered about this too. To put it another way, how does
mergeFactor affect an IndexWriter backed by a RAMDirectory? Can I set
mergeFactor to the highest possible value (given the machine's RAM) in
order to avoid merging segments?
Kevin A. Burton [EMAIL PROTECTED] 04/20/04 04:40AM
12 matches
Mail list logo