From: Howk, Michael [mailto:[EMAIL PROTECTED]]
Also, Lucene returns the parsed version of each of our
searches. When we
search by rou*d, Lucene parses it as rou*d (which is what we
would expect).
But when we search by rou?d, Lucene parses it as rou d. It
seems to wrap
the term in
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
But, StandardAnalyzer is no longer final (get the latest
build) and you
can write a class that subclasses it
Right. To flesh out Otis' example of how to change StandardAnalyzer's stop
list by defining a subclass of it:
public class
If you put the title in a separate field from the contents, and search both
fields, matches in the title will usually be stronger, without explicit
boosting. This is because the scores are normalized by the length of the
field, and the title tends to be much shorter than the contents. So even
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
You cannot, in general, structure a Lucene query such that it
will yield
the same document rankings that Google would for that (query, document
set). The reason for this is that Google employs a scoring
algorithm that
includes
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
After considerable study of the documentation, I am still
confused about the semantics of BooleanQuery.
Now, as sjb pointed out, (query, false, false) doesn't
really seem to have the semantics of a boolean OR.
In fact, it does.
In
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
Is either of the expressions below the correct parenthesization of the
expression above? If not, what is?
score_d = sum_t(tf_q * (idf_t / norm_q) * tf_d * (idf_t / norm_d_t) *
boost_t) * coord_q_d
That's correct. The tf*idf weights
From: tal blum [mailto:[EMAIL PROTECTED]]
2) Does the Document id changes after merging indexes adding
or deleting documents?
Yes.
4) assuming I have a term query that has a large number of
hits say 10 millions, is there a way to get the say the top
10 results without going through
I cannot replicate the problem you are having.
Can you please submit a complete, self-contained, test case illustrating the
problem you are having with the write lock.
Please test this against the latest nightly build of Lucene, from:
http://jakarta.apache.org/builds/jakarta-lucene/nightly/
From: Jonathan Franzone [mailto:[EMAIL PROTECTED]]
Whenever I add a PrefixQuery to my search the scoring gets
really small. For
example if I do a query like this: +java then the scoring
starts around
0.866... and so forth. But if I do a query like this: +java* then the
scoring start
From: Kelvin Tan [mailto:[EMAIL PROTECTED]]
True (and it's great) that once an IndexReader is open, no
actions on the IndexWriter affect it.
However, if an IndexReader is opened _after_ indexing begins,
I suppose it'll throw an exception? Doesn't it mean that when
indexing is taking
From: Mark Tucker [mailto:[EMAIL PROTECTED]]
What is the best way to
move the index from the build server to the search servers
and then change which index a user is searching against? I
am concerned about switching the index while a user is paging
through search results. Ideally
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Are you implying ( ... public synchronized Searcher
getSearcher()) to
use this synchronized method in a servlet/jsp thread as
well?
Yes.
Your jhtml example doesn't appear to
synchronzied. Maybe I'm missing something though.
From: Karl Øie [mailto:[EMAIL PROTECTED]]
I have created a testclass for working with Analyzers and ran
into a strange
problem; I cannot search for text in fields with more than
1 words!?!?
Lucene by default stops indexing after the 10,000th token.
See
A new release of Lucene is available, 1.2 release candidate 3.
The new release can be downloaded from:
http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc3/
If no major problems are identified in the next few days, we will make a 1.2
final release--the first final release since
From: Ype Kingma [mailto:[EMAIL PROTECTED]]
I'm creating a filter from a set of terms that are read from
a file, and I find that IndexReader.termDocs(Term(fieldName,
valueFromFile))
does this quite well (around 0.1 secs elapsed time in jython code.)
Would it be advantageous to sort the
Kelvin,
I don't seen powered by Lucene on your results pages:
http://www.relevanz.com/Search?query=media
If you add this, we can add you to the Powered by Lucene page:
http://jakarta.apache.org/lucene/docs/powered.html
What other sites should be added to this page?
Doug
-Original
In short, this is not currently supported, but might be someday.
For more details, see my recent response to a message with subject RE: Near
without slop.
Doug
-Original Message-
From: Tom Barrett [mailto:[EMAIL PROTECTED]]
Sent: Monday, December 03, 2001 3:42 PM
To: [EMAIL
From: Paddy Clark [mailto:[EMAIL PROTECTED]]
My current NEAR solution is to modify the query parser to build a
PhraseQuery from the terms surrounding NEAR and set the slop
correctly. This works for this kind of query:
Bob NEAR Jim
The problem comes when I try
microsoft NEAR app*
Lucene counts the same string in different fields as a different term. In
other words, a term is composed of a field and a string.
Doug
-Original Message-
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]]
Sent: Saturday, December 01, 2001 6:55 PM
To: [EMAIL PROTECTED]
Subject:
From: Winton Davies [mailto:[EMAIL PROTECTED]]
I have 4 million documents... I could:
Split these into 4 x 1 million document indexes and then send a
query to 4 Lucene processes ? At the end I would have to sort the
results by relevance.
Question for Doug or any other
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]]
I have noticed that when I kill/interrupt an indexing process, that it
leaves a lock file, preventing further indexing.
This raises a couple of questions:
a. When I simply delete the file and restart the indexing, it
seems to work.
Is
TermDocs are ordered by document number. It would not be easy to change
this.
Doug
-Original Message-
From: Winton Davies [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 29, 2001 11:12 AM
To: Lucene Users List
Subject: Re: Parallelising a query...
Hi again
If you are performing additions and deletions then you should serially
create an IndexReader to do deletions, close it, then create an IndexWriter
to do additions, close it, and so on. Note that typically one will use a
different IndexReader for deletions than is used for searching, so that
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]]
this is exactly what I was doing. Store=false, index=true,
and token=false.
It appeared to work ok, but searches *never* returned any hits.
That's why I suspect it is a bug.
If you think this is a bug, please submit a test case, as
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
I think this still works if the the document number continue
to increase
by one when documents are added incrementally.
Does anyone know if this is true (I haven't looked at the code yet).
Yes, that is true, so long as you do not delete
)
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:114)
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosRead
er.java:166)
I've attached the whole trace as gzipped.txt
regards,
Anders Nielsen
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: 10. november 2001 04:35
From: Anders Nielsen [mailto:[EMAIL PROTECTED]]
hmm, I seem to be getting a different number of hits when I
use the files
you sent out.
Please provide more information! Is it larger or smaller than before? By
how much? What differences show up in the hits? That's a terrible bug
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
How difficult would it be to get BooleanQuery to do a
standalone NOT, do you
suppose? That would be very useful in my case.
It would not be that difficult, but it would make queries slow. All terms
not containing a term would need to be
From: Paul Friedman [mailto:[EMAIL PROTECTED]]
It looks like there is a bug (besides the StandardAnalyzer
parsing 20-35 as a single term). The query in your example:
search(searcher, analyzer, FirstName:[a-k]);
is not finding the correct document. It is finding doc2, it
This should work. You should be able to find an un-tokenized field
containing spaces with a TermQuery. Nothing should ever tokenize the
string.
Can you please supply a simple, self-contained example showing that this
does not work?
Thanks,
Doug
-Original Message-
From: Winton
From: Sunil Zanjad [mailto:[EMAIL PROTECTED]]
Indexes left in an inconsistent state on crash (i don't
remember who
I believe that even I have reported it. This happens on
abrupt exit of the JVM
To do this I had one thread updating a directory containing
many .txt files and
From: Lee Mallabone [mailto:[EMAIL PROTECTED]]
I'm trying to implement this and should be able to contribute any
succesful results, but I need to produce context on a per-field basis.
Eg. if I got a token hit in the text body of a document, but the first
hit token was a word in the section
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
Thanks for the detailed information, Doug! That helps a lot.
Based on what you've said and on taking a closer look at the
code, it looks
like by setting mergeFactor and maxMergeDocs to
Integer.MAX_VALUE, an entire
index will be built in a
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
We're having a heck of a time with too many file handles
around here. When
we create large indexes, we often get thousands of temporary
files in a given index!
Thousands, eh? That seems high.
The maximum number of segments should be
From: Brook, James [mailto:[EMAIL PROTECTED]]
I am trying to use the 'lucene-1.2-rc1.jar' with a WebObjects 4.5
application, but having problems. WebObjects uses Java 1.1.8.
I read on the
jGuru Lucene FAQ that Lucene should work with this version of
Java. Is this
correct?
It should,
301 - 335 of 335 matches
Mail list logo