From: Sunil Zanjad [mailto:[EMAIL PROTECTED]]
Indexes left in an inconsistent state on crash (i don't
remember who
I believe that even I have reported it. This happens on
abrupt exit of the JVM
To do this I had one thread updating a directory containing
many .txt files and
From: Lee Mallabone [mailto:[EMAIL PROTECTED]]
I'm trying to implement this and should be able to contribute any
succesful results, but I need to produce context on a per-field basis.
Eg. if I got a token hit in the text body of a document, but the first
hit token was a word in the section
This should work. You should be able to find an un-tokenized field
containing spaces with a TermQuery. Nothing should ever tokenize the
string.
Can you please supply a simple, self-contained example showing that this
does not work?
Thanks,
Doug
-Original Message-
From: Winton
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
How difficult would it be to get BooleanQuery to do a
standalone NOT, do you
suppose? That would be very useful in my case.
It would not be that difficult, but it would make queries slow. All terms
not containing a term would need to be
Can folks please try to include complete, self-contained test cases when
submitting bugs? It's not that hard, and makes it much easier to figure out
what is going on.
For example, I have attached a complete, self-contained test case for the
bug reported below. It only took 50 lines.
From: Paul Friedman [mailto:[EMAIL PROTECTED]]
It looks like there is a bug (besides the StandardAnalyzer
parsing 20-35 as a single term). The query in your example:
search(searcher, analyzer, FirstName:[a-k]);
is not finding the correct document. It is finding doc2, it
)
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:114)
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosRead
er.java:166)
I've attached the whole trace as gzipped.txt
regards,
Anders Nielsen
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: 10. november 2001 04:35
From: Anders Nielsen [mailto:[EMAIL PROTECTED]]
hmm, I seem to be getting a different number of hits when I
use the files
you sent out.
Please provide more information! Is it larger or smaller than before? By
how much? What differences show up in the hits? That's a terrible bug
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
I think this still works if the the document number continue
to increase
by one when documents are added incrementally.
Does anyone know if this is true (I haven't looked at the code yet).
Yes, that is true, so long as you do not delete
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]]
this is exactly what I was doing. Store=false, index=true,
and token=false.
It appeared to work ok, but searches *never* returned any hits.
That's why I suspect it is a bug.
If you think this is a bug, please submit a test case, as
If you are performing additions and deletions then you should serially
create an IndexReader to do deletions, close it, then create an IndexWriter
to do additions, close it, and so on. Note that typically one will use a
different IndexReader for deletions than is used for searching, so that
From: Winton Davies [mailto:[EMAIL PROTECTED]]
I have 4 million documents... I could:
Split these into 4 x 1 million document indexes and then send a
query to 4 Lucene processes ? At the end I would have to sort the
results by relevance.
Question for Doug or any other
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]]
I have noticed that when I kill/interrupt an indexing process, that it
leaves a lock file, preventing further indexing.
This raises a couple of questions:
a. When I simply delete the file and restart the indexing, it
seems to work.
Is
TermDocs are ordered by document number. It would not be easy to change
this.
Doug
-Original Message-
From: Winton Davies [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 29, 2001 11:12 AM
To: Lucene Users List
Subject: Re: Parallelising a query...
Hi again
From: Brook, James [mailto:[EMAIL PROTECTED]]
I am trying to use the 'lucene-1.2-rc1.jar' with a WebObjects 4.5
application, but having problems. WebObjects uses Java 1.1.8.
I read on the
jGuru Lucene FAQ that Lucene should work with this version of
Java. Is this
correct?
It should,
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
We're having a heck of a time with too many file handles
around here. When
we create large indexes, we often get thousands of temporary
files in a given index!
Thousands, eh? That seems high.
The maximum number of segments should be
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
Thanks for the detailed information, Doug! That helps a lot.
Based on what you've said and on taking a closer look at the
code, it looks
like by setting mergeFactor and maxMergeDocs to
Integer.MAX_VALUE, an entire
index will be built in a
Lucene counts the same string in different fields as a different term. In
other words, a term is composed of a field and a string.
Doug
-Original Message-
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
Sent: Saturday, December 01, 2001 6:55 PM
To: [EMAIL PROTECTED]
Subject:
In short, this is not currently supported, but might be someday.
For more details, see my recent response to a message with subject RE: Near
without slop.
Doug
-Original Message-
From: Tom Barrett [mailto:[EMAIL PROTECTED]]
Sent: Monday, December 03, 2001 3:42 PM
To: [EMAIL
From: Paddy Clark [mailto:[EMAIL PROTECTED]]
My current NEAR solution is to modify the query parser to build a
PhraseQuery from the terms surrounding NEAR and set the slop
correctly. This works for this kind of query:
Bob NEAR Jim
The problem comes when I try
microsoft NEAR app*
Kelvin,
I don't seen powered by Lucene on your results pages:
http://www.relevanz.com/Search?query=media
If you add this, we can add you to the Powered by Lucene page:
http://jakarta.apache.org/lucene/docs/powered.html
What other sites should be added to this page?
Doug
-Original
From: Ype Kingma [mailto:[EMAIL PROTECTED]]
I'm creating a filter from a set of terms that are read from
a file, and I find that IndexReader.termDocs(Term(fieldName,
valueFromFile))
does this quite well (around 0.1 secs elapsed time in jython code.)
Would it be advantageous to sort the
From: Karl Øie [mailto:[EMAIL PROTECTED]]
I have created a testclass for working with Analyzers and ran
into a strange
problem; I cannot search for text in fields with more than
1 words!?!?
Lucene by default stops indexing after the 10,000th token.
See
A new release of Lucene is available, 1.2 release candidate 3.
The new release can be downloaded from:
http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc3/
If no major problems are identified in the next few days, we will make a 1.2
final release--the first final release since
From: Mark Tucker [mailto:[EMAIL PROTECTED]]
What is the best way to
move the index from the build server to the search servers
and then change which index a user is searching against? I
am concerned about switching the index while a user is paging
through search results. Ideally
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Are you implying ( ... public synchronized Searcher
getSearcher()) to
use this synchronized method in a servlet/jsp thread as
well?
Yes.
Your jhtml example doesn't appear to
synchronzied. Maybe I'm missing something though.
From: Kelvin Tan [mailto:[EMAIL PROTECTED]]
True (and it's great) that once an IndexReader is open, no
actions on the IndexWriter affect it.
However, if an IndexReader is opened _after_ indexing begins,
I suppose it'll throw an exception? Doesn't it mean that when
indexing is taking
This bug has been fixed. The fix will be in tonight's nightly build.
Doug
--
To unsubscribe, e-mail: mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]
From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
I've just updated my version (via CVS) and now I'm having
problems with document deletion. I'm trying to delete a document using
IndexReader's delete(Term) method and I'm getting an IOException:
java.io.IOException: Index locked for write:
From: Jonathan Franzone [mailto:[EMAIL PROTECTED]]
Whenever I add a PrefixQuery to my search the scoring gets
really small. For
example if I do a query like this: +java then the scoring
starts around
0.866... and so forth. But if I do a query like this: +java* then the
scoring start
From: tal blum [mailto:[EMAIL PROTECTED]]
2) Does the Document id changes after merging indexes adding
or deleting documents?
Yes.
4) assuming I have a term query that has a large number of
hits say 10 millions, is there a way to get the say the top
10 results without going through
I cannot replicate the problem you are having.
Can you please submit a complete, self-contained, test case illustrating the
problem you are having with the write lock.
Please test this against the latest nightly build of Lucene, from:
http://jakarta.apache.org/builds/jakarta-lucene/nightly/
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
After considerable study of the documentation, I am still
confused about the semantics of BooleanQuery.
Now, as sjb pointed out, (query, false, false) doesn't
really seem to have the semantics of a boolean OR.
In fact, it does.
In
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
Is either of the expressions below the correct parenthesization of the
expression above? If not, what is?
score_d = sum_t(tf_q * (idf_t / norm_q) * tf_d * (idf_t / norm_d_t) *
boost_t) * coord_q_d
That's correct. The tf*idf weights
If you put the title in a separate field from the contents, and search both
fields, matches in the title will usually be stronger, without explicit
boosting. This is because the scores are normalized by the length of the
field, and the title tends to be much shorter than the contents. So even
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
You cannot, in general, structure a Lucene query such that it
will yield
the same document rankings that Google would for that (query, document
set). The reason for this is that Google employs a scoring
algorithm that
includes
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
But, StandardAnalyzer is no longer final (get the latest
build) and you
can write a class that subclasses it
Right. To flesh out Otis' example of how to change StandardAnalyzer's stop
list by defining a subclass of it:
public class
From: Howk, Michael [mailto:[EMAIL PROTECTED]]
Also, Lucene returns the parsed version of each of our
searches. When we
search by rou*d, Lucene parses it as rou*d (which is what we
would expect).
But when we search by rou?d, Lucene parses it as rou d. It
seems to wrap
the term in
From: Aruna Raghavan [mailto:[EMAIL PROTECTED]]
I have noticed that unless I optimize the indexing while
adding documents to
it, the deleted documents are not getting physically deleted
right away
(even though they seemed to have been flagged as deleted.
The searcher
could not find
Hinrich,
Can you please send a stack trace?
As others have mentioned, there isn't an index integrity checker.
Doug
P.S. Hi! How are you?
-Original Message-
From: H S [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 01, 2002 5:26 PM
To: [EMAIL PROTECTED]
Subject: corrupted index
[I'm resending this from a different account, since my first attempt is
bogged down somewhere. A second copy will probably show up tomorrow, but in
the interests of solving this problem sooner, I'm resending it. Sorry for
the duplicaton.]
Define an Analyzer that does not lowercase the id
Peter Carlson wrote:
I don't know the actual algorithm, but when you type in the search
title:hello^3 AND heading:dolly^4
Will product different document scores than
title:hello AND heading:dolly^4
Lucene will get the score for a given document, not a field. So it does
combine the
Peter Carlson wrote:
I don't know the actual algorithm, but when you type in the search
title:hello^3 AND heading:dolly^4
Will product different document scores than
title:hello AND heading:dolly^4
Lucene will get the score for a given document, not a field. So it does
combine the
Karl Øie wrote:
If a crash happends during writing happens there is no good way to know if the
index is intact, removing lock files doesn't help this fact, as we really
don't know. So providing rollback functionality is a good but expensive way
of compensating for lack of recovery.
The
Karl Øie wrote:
A better solution would be to hack the FSDirectory to store each file it would
store in a file-directory as a serialized byte array in a blob of a sql
table. This would increase performance because the whole Directory don't have
to change each time, and it doesn't have to
Halcsy Pter wrote:
A lot of people requested a code to cache opened Searcher objects until the index is
not modified. The first version of this was writed by Scott Ganyo and submitted as
IndexAccessControl to the list.
Now I've decoupled the logic that is needed to manage searher.
Scott Ganyo wrote:
I'd like to see the finalize() methods removed from Lucene entirely. In a
system with heavy load and lots of gc, using finalize() causes problems.
[ ... ]
External resources (i.e. file handles) are not released until the reader
is closed. And, as many have found,
Hang Li wrote:
Why there are so many final and package-protected methods?
The package private stuff was motivated by Javadoc. When I wrote Lucene I
wanted the Javadoc to make it easy to use. Thus I did not want the Javadoc
cluttered with lots of methods that 99% of users did not need to
Halcsy Pter wrote:
I made an IndexReaderCache class from the code you have sent (the code in
demo/Search.jhtml).
But this causes exception:
IndexSearcher searcher = new IndexSearcher(cache.getReader(/data/index));
searcher.close();
searcher = new
Mike Tinnes wrote:
I'm trying to implement a HITS/PageRank type algorithm and need to modify
the document scores after a search is performed. The final score will be a
combination of the lucene score and PageRank. Is there currently a way to
modify the scores on the fly via HitCollector? so
Armbrust, Daniel C. wrote:
I don't know what a good numbers implementation is, but the way that I do it now,
with filters on the bit set after they come back just feels like a hack. Even if bit
sets are very fast, it doesn't seem right to iterate over nearly the entire set of
terms to filter
Terry Steichen wrote:
fine now. (I thought I read someplace that you didn't have to optimize after
a delete, but if I don't, it doesn't seem to work.)
You don't need to optimize after delete for search results to be
correct. However IndexReader.docFreq() may be incorrect until you've
Ian Lea wrote:
In org/apache/lucene/analysis/standard/StandardAnalyzer.java.
The source code for the current release is also on the website. In
particular, this file is available as:
http://jakarta.apache.org/lucene/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java
Doug
Clemens Marschner wrote:
1. I think the new document boost is missing, isn't it?
With that it should be something like
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
* coord_q_d * boost_d
Is that correct?
Almost. This should actually be boost_d * boost_d_t,
Mailing Lists Account wrote:
I need to search a bunch of documents.Each document needs to be searched
only once. That means once I build the index and search it, I have no need
for that index and the document again.
This does not sound like the problem that Lucene is designed to solve.
Ype Kingma wrote:
I extracted again, and found my problem:
One of the extracted files is lucene-1.2-src.jar. When unzipping this you
get a directory tree with only the directories mentioned.
As I recall, this jar contains only those java source files that are
generated by JavaCC. I don't
[EMAIL PROTECTED] wrote:
My first thought is to
define a Field.Keyword(composite-key, domain + \u + id). This
would allow me to use the delete(Term) interface to delete the key.
That sounds like a good way to solve this.
You could also use a HitCollector with a Query, but I think the
This looks like a good approach. When I get a chance, I'd like to make
Similarity an interface or an abstract class, whose default
implementation would do what the current class does, but whose methods
can be overridden. Then I'd add methods like:
public static void
Stephan Grimm wrote:
Is there a way to retrieve the original term positions during the search
process invoked by Searcher.search()? In addition to the documents and their
scores we want to have access to the positions of the terms found in order
to do a highlighting. We don't want to perform
Schaeffer, David wrote:
I am planning to upgrade from Lucene 1.0 to Jakarta Lucene 1.2. My current
implementation uses Jason Pell's URLDirectory class so that Lucene can access the
search index while running in an applet. I modified IndexReader.java to use
URLDirectory instead of
If you check the CHANGES file for changes made since the 1.2 release,
you'll find:
Added support for boosting the score of documents and fields via the
new methods Document.setBoost(float) and Field.setBoost(float).
Note: This changes the encoding of an indexed value. Indexes should
Right. Use the fields() iterator to scan for multiple Field instances
with the same name().
Doug
Rob Outar wrote:
Would the solution be to call Document.fields(), iterate through that enum
and get my data?
Thanks,
Rob
-Original Message-
From: Rob Outar
Kelvin Tan wrote:
Does an in-memory Field guarantee access to its name and value? Say I
retrieve a Field from a Document A, and add it to a new Document B.
Before writing B to the index, I delete A. Would B still contain the
Field? If so, does it work for both String-based and Reader-based
A self-contained, reproducible test case is required before someone can
really start looking at it. What is the history of this index? Have
attempts to update it ever failed prior to this?
Doug
Avi Drissman wrote:
At 8:56 AM -0400 9/20/02, you wrote:
Because of this problem, this issue
My guess is that you have around 40 fields. Each field requires a
separate file in each segment. Can you combine any of your fields?
Terry Steichen wrote:
I need to modify my original issue below. I was in error - the optimization
does indeed bring the total number of index files back to 46.
Isn't the break on line 162 of RangeQuery.java supposed to achieve this?
Alex Winston wrote:
otis,
i was able to fix the junit build problems, with the newest versions of
ant in regards to lucene unit tests. it appears that the junit.jar must
appear in the $ANT_HOME/lib dir in order to run
This would not be hard to implement.
It would take something like:
public abstract String[] IndexReader.getFieldNames();
This would need to be implemented in two classes, SegmentReader and
SegmentsReader. The former would just access its fieldInfos field to
list fields. The latter would
and writing at the same time?
I
thought I read this in the FAQ.
Roy.
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, November 20, 2002 5:04 PM
To: Lucene Users List
Subject: Re: Stress/scalability testing Lucene
* Replies will be sent through Spamex to
[EMAIL
A deletion is only visible in other IndexReader instances created after
the IndexReader where you made the deletion is closed. So if you're
searching using a different IndexReader, you need to re-open it after
the deleting IndexReader is closed. The lastModified method helps you
to figure
mergeFactor?
--- Doug Cutting [EMAIL PROTECTED] wrote:
The data is actually removed the next time its segment is merged.
Optimizing forces it to happen, but it will also eventually happen as
more documents are added to the index, without optimization.
Scott Ganyo wrote:
It just marks the record
.
Is this right?
Thanks,
Otis
--- Doug Cutting [EMAIL PROTECTED] wrote:
Merging happens constantly as documents are added. Each document is
initially added in its own segment, and pushed onto the segment
stack.
Whenever there are mergeFactor segments on the top of the stack that
are
the same size
Clemens Marschner wrote:
So what if documents are deleted in the meantime? Then the recursive merge
can't determine the X segments with the same size.
If you read my previous message you'll find the answer:
Doug Cutting wrote:
It's actually a little more complicated than that, since (among
In the pre-release version available in the nightly builds you can boost
document fields at index time. Check out the CHANGES.txt file for details.
Doug
Ashley Collins wrote:
Is it possible to stop keyword fields contributing to a document's
score? Leaving only text fields?
Is the best way
I'm not sure I understand the question, but I'll hazard an answer
anyway. Might it work to maintain separate indexes for B, C, E and F,
then use a MultiSearcher to search them all? That would keep updates
local...
Doug
Cohan, Sean wrote:
I am a total newbie to Lucene. We are developing
I believe that the underlying search and indexing code should correctly
handle terms with zero-length text, although I have never tested this.
However I know of no query parser syntax to generate such terms in a
query. But it should work to use them in a manually constructed query.
Doug
Minh
Sale, Doug wrote:
it depends on what you mean by corrupt. i think there are 3 cases:
1) the process died during a non-writing action (woo-hoo!)
2) the process died during a user-writing action (building a document)
3) the process died during a system-writing action (writing an index file)
i
Armbrust, Daniel C. wrote:
While I was trying to build this index, the biggest limitation of
Lucene that I ran into was optimization. Optimization kills the
indexers performance when you get between 3-5 million documents in an
index. On my Windows XP box, I had to reoptimize every 100,000
petite_abeille wrote:
On Tuesday, Dec 17, 2002, at 17:43 Europe/Zurich, Doug Cutting wrote:
Index updates are atomic, so it is very unlikely that the index is
corrupted, unless the underlying file system itself is corrupted.
Ummm... Perhaps in theory... In practice, indexes seems to get
petite_abeille wrote:
On Friday, Dec 20, 2002, at 19:58 Europe/Zurich, Scott Ganyo wrote:
FYI: The best thing I've found for both increasing speed and reducing
file handles is to use an IndexWriter on a RamDirectory for indexing
and then use FileWriter.addIndexes() to write the result to
Erik Hatcher wrote:
Is it possible for me to retrieve all the values of a particular field
that exists within an index, across all documents?
For example, I'm indexing documents that have a category associated
with them. Several documents will share the same category. I'd like to
be able to
Terry Steichen wrote:
I tested StandardAnalyzer (which uses StandardTokenizer) by inputing the a set of strings which produced the following results:
aa/bb/cc/dd was tokenized into 4 terms: aa, bb, cc, dd
aa/bb/cc/d1 was tokenized into 3 terms: aa, bb, cc/d1
aa/bb/c1/dd was tokenized into 2
Erik Hatcher wrote:
I'd like to revisit this issue. First, I add the path field to the
Document in this way:
doc.add(Field.Keyword(path, path));
This field is, of course, not tokenized by the Analyzer, right? So
shouldn't QueryParser take this fact into account on a field-by-field
Doug Cutting wrote:
However, in most cases where this is an issue, the real problem is that
folks are placing too much reliance on the query parser. The query
parser is designed for user-entered queries. If you're programmatically
generating query strings that are then fed to the query
It should always be safe to search an index, even while optimizing.
Harpreet S Walia wrote:
Hi,
I am using lucene on windows and have the following query abt optimization.
Is it safe to search if a optimize process is going on . i found a reference of this in the archives which said that on
Lucene is thread and process safe.
An IndexReader, once opened, always reflects the same state of the
index. To see changes made by another thread or process you must open a
new IndexReader.
Doug
Joe Consumer wrote:
I read a while back that Lucene is not thread safe.
That was in the FAQ on
My guess would be that you're using an IndexReader that has been closed.
Doug
petite_abeille wrote:
Hello,
Here is another symptom of misbehavior in Lucene:
java.io.IOException: Bad file descriptor
at java.io.RandomAccessFile.readBytes(Native Method)
at
petite_abeille wrote:
On Tuesday, Jan 7, 2003, at 22:46 Europe/Zurich, Doug Cutting wrote:
This could happen if Lucene's file locking is disabled or broken.
[ ... ]
File locking is known to be broken over NFS, and wasn't even present
in early versions of Lucene. Are you using an ordinary
Do you want hits to contain the word words or not? You've got it in
both clauses...
Also, +(a b c) requires that any of a b or c be in a document,
but not necessarily all of them. If you want it to contain all of them
then each term must be required, e.g., +a +b +c. In the latest
sources
Terry Steichen wrote:
I read all the relevant references I could find in the Users (not
Developers) list, and I still don't exactly know what to do.
What I'd like to do is get a relevancy-based order in which (a) longer
documents tend to get more weight than shorter ones, (b) a document body
Terry Steichen wrote:
Can you give me an idea of what to replace the lengthNorm() method with to,
for example, remove any special weight given to shorter matching documents?
The goal of the default implementation is not to give any special weight
to shorter documents, but rather to remove the
Mailing Lists Account wrote:
Doug Cutting wrote:
That's because Google and most internet search engines never do any
stemming.
Generally speaking, are there any advantages not to apply the stemmer ?
Except for certain keywords,I found use of stemmers helpful.
Generally speaking, stemmers
Check out the new Explanation API in the latest CVS sources. It permits
one to get a detailed explanation of how a query was scored against a
document. Note that these explanations are designed for user perusal,
not for further computation, and are as expensive to construct as
re-running the
: Doug Cutting [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, February 10, 2003 1:57 PM
Subject: Re: Computing Relevancy Differently
Terry Steichen wrote:
Can you give me an idea of what to replace the lengthNorm() method
with
to,
for example, remove any special weight
Joseph Ottinger wrote:
Then this means that my IndexReader.delete(i) isn't working properly. What
would be the common causes for this? My log shows the documents being
deleted, so something's going wrong at that point.
Are you closing the IndexReader after doing the deletes? This is
required for
Ching-Pei Hsing wrote:
Even if we boost the Name by 10 like the following query, It's still the
same.
query = (NAME:inn NAME:comfort NAME:shampoo)^10 (MMNUM:inn MMNUM:shampoo
MMNUM:comfort) (SMNUM:shampoo SMNUM:comfort SMNUM:inn)
In the 1.2 release, I don't think this sort of boosting (of a
Rishabh Bajpai wrote:
I am getting a long value between 1(included) and 0(excluded-I think), and it makes sense to me logically as well - I wouldnt know what a value of greater than 1 would mean, and why should a term that has a score of 0 be returned in the first place! But just to be sure, I
Morus Walter wrote:
Searches must be able on any combination of collections.
A typical search includes ~ 40 collections.
Now the question is, how to implement this in lucene best.
Currently I see basically three possibilities:
- create a data field containing the collection name for each document
There's a new Lucene release available for download.
See the website for details:
http://jakarta.apache.org/lucene/docs/index.html
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
Maik Schreiber wrote:
In an index I have documents with a field that has been constructed using
Field.UnIndexed(). Now I want to switch to Field.Keyword() so I can search for those
fields,
too.
Does it cause any harm if I'm mixing field types like that?
I think this used to throw an exception,
Ulrich Mayring wrote:
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
The Snowball project has good stop lists.
See:
http://snowball.tartarus.org/
http://snowball.tartarus.org/english/stop.txt
1 - 100 of 342 matches
Mail list logo