)
Would the query run faster?
Exchanging the operands of AND would not make a noticeable difference
in speed. Queries are evaluated by iterating the inverted term index entries
for all query terms in parallel, with buffering.
Regards,
Paul Elschot
the buffering for a TermScorer should be made
dependent on it's expected use: more buffering for top level OR, less
buffering when used under AND.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
? (type:1 81)
I would really think to do this all in one Query. Is this even possible?
How would you want to combine the results?
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e
Erik,
On Saturday 19 February 2005 01:33, Erik Hatcher wrote:
On Feb 18, 2005, at 6:37 PM, Paul Elschot wrote:
On Friday 18 February 2005 21:55, Erik Hatcher wrote:
On Feb 18, 2005, at 3:47 PM, Paul Elschot wrote:
Erik,
Just curious: it would seem easier to use multiple fields
On Saturday 19 February 2005 11:02, Erik Hatcher wrote:
On Feb 19, 2005, at 3:52 AM, Paul Elschot wrote:
By lowercasing the querytext and searching in title_lc ?
Well sure, but how about this query:
title:Something AND anotherField:someOtherValue
QueryParser, as-is, won't
Erik,
Just curious: it would seem easier to use multiple fields for the
original case and lowercase searching. Is there any particular reason
you analyzed the documents to multiple indexes instead of multiple fields?
Regards,
Paul Elschot
On Friday 18 February 2005 21:55, Erik Hatcher wrote:
On Feb 18, 2005, at 3:47 PM, Paul Elschot wrote:
Erik,
Just curious: it would seem easier to use multiple fields for the
original case and lowercase searching. Is there any particular reason
you analyzed the documents to multiple
instance with same name the gap is not needed.
Regards,
Paul Elschot
I hope this is clear! Kinda hard to articulate.
Owen
Erik
On Feb 12, 2005, at 3:08 PM, Owen Densmore wrote:
I'm getting a bit more serious about the final form of our lucene
index. Each document has
queries.
If it is, there is some code in development that might help
.
In case it turns out that the memory occupied by the BitSet of the filter
is a bottleneck, please check the (very) recent archives of lucene-dev
on BitSet implementation.
Regards,
Paul Elschot
clauses in query.
In the development version this restriction has gone.
The limitation of the maximum clause count (default 1024,
configurable) is still there.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED
On Friday 04 February 2005 17:29, Bill Tschumy wrote:
On Feb 4, 2005, at 10:19 AM, Bill Tschumy wrote:
On Feb 3, 2005, at 2:04 PM, Paul Elschot wrote:
On Thursday 03 February 2005 20:18, Bill Tschumy wrote:
Is there any way to construct a query to locate all documents
without
not carry the
old state forward.
The new constructor does carry the new state backward.
I'll post a fix in bugzilla later.
Thanks,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
names of all (other) indexed fields in the document.
Assuming there is always a primary key field the query is then:
+fieldnames:primarykeyfield -fieldnames:specificfield
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL
build.xml.
You need to correct the version property in the build.xml file:
property name=version value=1.4.3/
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
beforehand):
cvs -d :pserver:[EMAIL PROTECTED]:/home/cvspublic checkout -r lucene_1_4_3 -d
lucene-1.4.3 jakarta_lucene
In there you can correct the build.xml file and do:
ant compile
to compile the source code.
Regards,
Paul Elschot
On Wednesday 02 February 2005 20:55, Helen Butler wrote:
Hi Paul
for the few minutes instead of hours,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Friday 28 January 2005 22:30, Andy Goodell wrote:
You should be fine.
For search performance, yes. But the extra field data does slow down
optimization of a modified index because all the field (and index) data
is read and written for that. When the extra data gets bulky, it's normally
better
:
+(synA1 synA2 ) +(synB1 synB2 ...) +(synC1 synC2 ...)
the development version of BooleanQuery might be a bit faster
than the current one.
For an interesting twist in the use of idf please search
for fuzzy scoring changes on lucene-dev at the end of 2004.
Regards,
Paul Elschot
me what I've done wrong?
Maybe all query hits were filtered out?
Could you compare the docnrs in the bits of the filter with the
unfiltered query hits docnrs?
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED
.
No one has done this yet, so I guess it's prefered to buy RAM instead...
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
on your operating system, the size of the index, the amount
of RAM you can use, the file buffering efficiency, other loads on the
computer ...
c) Is there a faster method to what I am doing I should consider?
Preindexing all word combinations that you're interested in.
Regards,
Paul Elschot
Sorry for the duplicate on lucene-dev, it should have gone to lucene-user
directly:
A bit more:
On Thursday 06 January 2005 10:22, Paul Elschot wrote:
On Thursday 06 January 2005 02:17, Andrew Cunningham wrote:
Hi all,
I'm currently doing a query similar to the following:
for w
a score
that is within the range of the change.
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
level search methods.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
in createWeight().
Then inherit from QueryParser to use the above Query in getBooleanQuery().
Finally use such a query in a search: the document scores will be
the coord() values.
Regards,
Paul Elschot.
-
To unsubscribe, e-mail
the name of that segment in
the deletable file, so it can try later to delete that segment.
This is known behaviour on FAT file systems. These randomly take some time
for themselves to finish closing a file after it has been correctly closed by
a program.
Regards,
Paul Elschot
that this relevan Document Id
Originated from Which MRG???
[ Some thing like this : - Search word 'ISBN12345' is avalible from
MRGx ]
I think you are looking for the methods subSearcher() and subDoc() on
MultiSearcher.
Regards,
Paul Elschot
only for search results on the query
over the whole index.
The bit filters generally work well, except when you need
a lot of very sparse filters and memory is a concern.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL
. Higher powers as above can come
a long way, though.
Regards,
Paul Elschot
Thanks,
Gururaja
Mike Snare [EMAIL PROTECTED] wrote:
I'm still new to Lucene, but wouldn't that be the coord()? My
understanding is that the coord() is the fraction of the boolean query
that matched a given
queries an index after it is opened.
Filters can be cached, see the recent discussion on CachingWrappingFilter
and friends.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
of the primary
key field can serve as the constant value.
Regards,
Paul Elschot
-Original Message-
From: Aviran [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 09, 2004 2:08 PM
To: 'Lucene Users List'
Subject: RE: Retrieving all docs in the index
In this case you'll have to add
Paul,
On Friday 03 December 2004 23:31, you wrote:
Hi,
how yould you restrict the search results for a certain user? I'm
One way to restrict results is by using a Filter.
indexing all the existing data in my application but there are certain
access levels so some users should see more
might also be used to reduce the I/O for searching, but Lucene
doesn't do that now, probably because there was little to gain.
Regards,
Paul Elschot.
P.S. The code doing the filtering is in IndexSearcher.java, from line 97
On Friday 03 December 2004 08:43, Paul Elschot wrote:
On Friday 03 December 2004 07:50, Chris Hostetter wrote:
...
So, If I'm understanding you (and the javadocs) correctly, the real key
here is maxMergeDocs. It seems like addDocument will never merge a
segment untill maxMergeDocs have
the DefaultSimilarity.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
with multiple threads.
Last time I checked that, there was a moderate speed up
using three threads instead of one on a single CPU machine.
Tuning the values of minMergeDocs and maxMergeDocs
may also help to increase performance of adding documents.
Regards,
Paul Elschot
of the parallel class hierarchy. That could also
involve some sort of accrual scorer and Lucene's Similarity.
Regards,
Paul Elschot
-Ken
On Sat, 13 Nov 2004 12:07:05 +0100, Paul Elschot [EMAIL PROTECTED]
wrote:
On Friday 12 November 2004 22:56, Chuck Williams wrote:
I had a similar need
Chris,
On Tuesday 23 November 2004 03:25, Hoss wrote:
(NOTE: numbers in [] indicate Footnotes)
I'm rather new to Lucene (and this list), so if I'm grossly
misunderstanding things, forgive me.
One of my main needs as I investigate Search technologies is to restrict
results based on Ranges
On Monday 22 November 2004 05:02, Kauler, Leto S wrote:
Hi Lucene list,
We have the need for analysed and 'not analysed/not tokenised' clauses
within one query. Imagine an unparsed query like:
+title:Hello World +path:Resources\Live\1
In the above example we would want the first clause
-space.
Does anyone work on a project like this?
I don't know. Is there a good SVD package for Java?
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Thursday 18 November 2004 16:57, Rupinder Singh Mazara wrote:
hi all
I needed some help in solving the following problem
a user executes query1 and query2
both the queries( not result sets ) get stored, over time the user
wants to find
which documents from query1 are common to
On Wednesday 17 November 2004 01:20, Edwin Tang wrote:
Hello,
I have been using DateFilter to limit my search results to a certain date
range. I am now asked to replace this filter with one where my search
results
have document IDs greater than a given document ID. This document ID is
On Wednesday 17 November 2004 07:10, Karthik N S wrote:
Hi guy's
Apologies.
So A Mergeed Index is again a Single [ addition of subIndexes... ),
If that case , If One of the Field Types is of type 'Field.Keyword'
whic is Unique across the subIndexes [Before Merging].
want to use a filter for the dates.
See DateFilter and the archives on MMDD.
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Friday 12 November 2004 22:56, Chuck Williams wrote:
I had a similar need and wrote MaxDisjunctionQuery and
MaxDisjunctionScorer. Unfortunately these are not available as a patch
but I've included the original message below that has the code (modulo
line breaks added by simple text email
matching exactly one character.
I think it would be better encourage the users to use longer
and maybe also more prefixes. This gives more precise results
and is more efficient to execute.
Regards,
Paul Elschot
-
To unsubscribe, e
get what they pay for.
Imposing a minimum prefix length can be done by overriding the method
in QueryParser that provides a prefix query.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands
code.
When you need more info on this, try lucene-dev.
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
,
but that is difficult to express in the current query syntax.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
prefix is required.
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
the value efficiently.
The only updates available are on the field norms.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Tuesday 09 November 2004 23:14, Luke Francl wrote:
On Tue, 2004-11-09 at 16:00, Paul Elschot wrote:
Lucene has no provision for matching by being prohibited only. This can
be achieved by indexing something for each document that can be
used in queries to match always, combined
there are more options
like using faster disks and/or using RAM for critical parts of the index.
Lucene can use extra RAM in various ways. To configure that one may have
to do some java coding. Profiling can guide you there.
Regards,
Paul Elschot
,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
when the documentID is created?
To know the docId use an indexed primary key in lucene and search
for it using IndexReader.termDocs(new Term(keyField, keyValue)).
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED
% iirc. More threads
were of no use for me in that case.
Regards,
Paul Elschot
Otis
--- Chris Fraschetti [EMAIL PROTECTED] wrote:
if i have four threads all trying to call my index function, will
lucene do what is necessary for each thread to wait until the writer
is available
to combine the other two to provide the search results,
usually a BooleanScorer or a ConjunctionScorer.
For proximity queries, other scorers are used.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED
the {0,1} values for?
Regards,
Paul Elschot.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On Tuesday 12 October 2004 19:27, Paul Elschot wrote:
IndexReader.open(indexName).termDocs(new Term(term,
field)).skipTo(documentNr)
returns the boolean indicating that.
Well, almost. When it returns true one still needs to check the TermDocs
for being at the documentNr.
Paul Elschot
of the query term weights would have
the query weights directly apllied to the the query term density in the document field,
whereas now the weights seem to be applied to the square root of the density.
The density value is an approximation, see above for the rough field norms.
Regards,
Paul Elschot
.
The encoding/decoding is somewhat rough, though.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
internally, a maximum was introduced to
avoid running out of memory.
You can change the maximum nr of added clauses using
BooleanQuery.setMaxClauseCount() but then it is advisable
to monitor memory usage, and evt. increase heap space for the JVM.
Regards,
Paul Elschot
that lucene doesn't need to
search again, or would the search be cached and no delay arise?
Just looking for some ideas and possibly some implementational issues...
Lucene's Hits class is designed for paging through search results.
In which order would you need the 1.000.000 results?
Regards,
Paul
you doc ids instead of over dates.
This will give you a filter for the doc ids you want to query.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
to the Lucene formula without coord().
Regards,
Paul Elschot
On Tuesday 14 September 2004 23:49, Doug Cutting wrote:
Your analysis sounds correct.
At base, a weight is a normalized tf*idf. So a document weight is:
docTf * idf * docNorm
and a query weight is:
queryTf * idf * queryNorm
On Monday 20 September 2004 20:54, Shawn Konopinsky wrote:
Hey Paul,
Thanks for the quick reply. Excuse my ignorance, but what do I do with the
generated BitSet?
You can return it in in the bits() method of the object implementing your
org.apache.lucene.search.Filter
for this
particular search, where all other searches use the pool. Suggestions?
You could use a map from the IndexSearcher back to the IndexReader that was
used to create it. (It's a bit of a waste because the IndexSearcher has a reader
attribute internally.)
Regards,
Paul Elschot
on lucene-dev.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
the document table. The reason I want to do
this is to reduce the numbers of documents that the full text query will
run.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
optimisations (needed after adding/deleting docs) copy all data
so it pays to keep the Lucene indexes small.
Later you might need multiple indexes, MultiSearcher, and occasionally
a merge of the indexes.
Regards,
Paul Elschot
to crawl and index an intranet or more, have a look
at Nutch.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
then see the total disk size of for example the stored fields.
Regards,
Paul Elschot
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Kevin,
On Thursday 05 August 2004 23:32, Kevin A. Burton wrote:
I'm trying to compute a filter to match documents in our index by a set
of terms.
For example some documents have a given field 'category' so I need to
compute a filter with mulitple categories.
The problem is that our
On Wednesday 04 August 2004 18:22, John Z wrote:
Hi
I had a question related to number of fields in a document. Is there any
limit to the number of fields you can have in an index.
We have around 25-30 fields per document at present, about 6 are keywords,
Around 6 stored, but not indexed
On Monday 26 July 2004 21:41, John Patterson wrote:
Is there any way to cache TermDocs? Is this a good idea?
Lucene does this internally by buffering
up to 32 document numbers in advance for a query Term.
You can view the details here in case you're interested:
fit all, I don't think) and what syntax
should be used.
Paul Elschot created a surround query parser that he posted about to
the list in April.
Erik
Here is a bit about the syntax for Surround (mostly taken from the
posted tgz file). Basically one has to use an operator for everything
On Thursday 11 March 2004 06:15, Tomcat Programmer wrote:
I have a situation where I need to be able to find
incomplete word matches, for example a search for the
string 'ape' would return matches for 'grapes'
'naples' 'staples' etc. I have been searching the
archives of this user list and
77 matches
Mail list logo