I didn't want to let this drop this on the floor, but I haven't had the
time to craft a response to it either. So, just for the record I agree
that transactions would be nice. I think that it is important that the
solution address change visibility and concurrent transactions within
multiple
Why does it seem to you that C# is faster than Java?
In any case, generally the bottleneck isn't the VM. It's the I/O to
the disks...
Scott
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on
You can use:
BooleanQuery.setMaxClauseCount(int maxClauseCount);
to increase the limit.
On Sep 30, 2004, at 8:24 PM, Chris Fraschetti wrote:
I recently read in regards to my problem that date_field:[0820483200
TO 110448]
is evluated into a series of boolean queries ... which has a cap of
1024
At one point it definitely supported null for either term. I think
that has been removed/forgotten in the later revisions of the
QueryParser...
Scott
On Jun 10, 2004, at 1:24 PM, Erik Hatcher wrote:
On Jun 10, 2004, at 2:13 PM, Terry Steichen wrote:
Actually, QueryParser does support
It looks to me like Revision 1.18 broke it.
On Jun 10, 2004, at 3:26 PM, Erik Hatcher wrote:
On Jun 10, 2004, at 4:07 PM, Terry Steichen wrote:
Well, I'm using 1.4 RC3 and the null range upper limit works just
fine for
searches in two of my fields; one is in the form of a cannonical date
(eg,
Well, I do like the *, but apparently there are some people that are
using this with the null...
Scott
On Jun 10, 2004, at 7:15 PM, Erik Hatcher wrote:
On Jun 10, 2004, at 4:54 PM, Scott ganyo wrote:
It looks to me like Revision 1.18 broke it.
It seems this could be it:
revision 1.18
date: 2002
I don't buy it. HashSet is but one implementation of a Set. By
choosing the HashSet implementation you are not only tying the class to
a hash-based implementation, you are trying the interface to *that
specific* hash-based implementation or it's subclasses. In the end,
either you buy the
I have. While document.add() itself doesn't increase over time, the
merge does. Ways of partially overcoming this include increasing the
mergeFactor (but this will increase the number of file handles used),
or building blocks of the index in memory and then merging them to
disk. This has
No, you don't need required or prohibited, but you can't have both.
Here is a rundown:
* A required clause will allow a document to be selected if and only if
it contains that clause and will exclude any documents that don't.
* A prohibited clause will exclude any documents that contain that
I don't think adding extensive locking is necessary. What you are
probably experiencing is that you've closed the index before you're done
using it. If you aren't careful to close the index only after all
searches on it have been completed, you'll get an error like this.
Scott
[EMAIL
Offhand, I would say that using 2 directories and merging them is
exactly what you waht. It really shouldn't be all that complicated and
Lucene should handle the synchronization for you...
Scott
Dror Matalon wrote:
Hi folks,
We're in the process of adding search to our online RSS
Hi Eugene,
Yes. Doug (Cutting) added this to eliminate OutOfMemory errors that
apparently some people were having. Unfortunately, it causes
backward-compatibility issues if you were used to using version 1.2.
So, you'll need to add a call like this:
Yes. You can (and should for best performance) reuse an IndexSearcher
as long as you don't need access to changes made to the index. An open
IndexSearcher won't pick up changes to the index, so if you need to see
the changes, you will need to open a new searcher at that point.
Scott
Aviran
Be careful with option 1. NFS and the Lucene file-based locking
mechanism don't get along extremely well. (See the archives for details...)
Scott
Lienhard, Andrew wrote:
I can think of three options:
1) Single index dir on a shared drive (NFS, etc.) which is mounted on each
app server.
2)
Do these implementations maintain file compatibility with the Java version?
Scott
Erik Hatcher wrote:
I'd love to see there be quality implementations of the Lucene API in
other languages, that are up to date with the latest Java codebase.
I'm embarking on a Ruby port, which I'm hosting at
Nifty cool! I'm gonna like this, I can tell already!
I'm having a really hard time actually using Luke, though, as all the
window panes and table columns are apparently of fixed size. Do you
think you could through in the ability to resize the various window
panes and table columns? This
+1. Support for transactions in Lucene are high on my list of desirable
features as well. I would love to have time to look into adding this,
but lately... well, you know how that goes.
Scott
Eric Jain wrote:
If you want to update a set of documents, you can remove their previous
version
It just marks the record as deleted. The record isn't actually removed
until the index is optimized.
Scott
Rob Outar wrote:
Hello all,
I used the delete(Term) method, then I looked at the index files,
only one
file changed _1tx.del I found references to the file still in some
of the
I'm rather partial to Jini for distributed systems, but I agree that
JXTA would definitely be the way to go on this type of peer-to-peer
scenario.
Scott
[EMAIL PROTECTED] wrote:
I'll be doing something very similar some time in the next 12 months for
the project I'm working on. I'll be more
Hi Alex,
I just looked at this and had the following thought:
The RangeQuery must continue to iterate after the first match is found
in order to match everything within the specified range. In other
words, if you have a range of a to d, you can't stop with a, you
need to continue to d. At
Actually, 10k isn't very large. We have indexes with more than 1M
records. It hasn't been a problem.
Scott
Tim Jones wrote:
Hi,
I am currently starting work on a project that requires indexing and
searching on potentially thousands, maybe tens of thousands, of text
documents.
I'm hoping
Cool. But instead of adding a new class, why not change Hits to inherit
from Filter and add the bits() method to it? Then one could pipe the
output of one Query into another search without modifying the Queries...
Scott
-Original Message-
From: Doug Cutting [mailto:[EMAIL
Are you closing the searcher after each when done?
No: Waiting for the garbage collector is not a good idea.
Yes: It could be a timeout on the OS holding the files handles.
Either way, the only real option is to avoid thrashing the searchers...
Scott
-Original Message-
From: Hang
? It would seem that if there was an efficient implementation
of a forked file, perhaps that could be used instead of the set of files
that Lucene currently uses to represent a segment.
Scott
-Original Message-
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 23, 2002 10:13 AM
I'd like to see the finalize() methods removed from Lucene entirely. In a
system with heavy load and lots of gc, using finalize() causes problems. To
wit:
1) I was at a talk at JavaOne last year where the gc performance experts
from Sun (the engineers actually writing the HotSpot gc) were
with them
rather than allowing finalization to take care of it.
Scott
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 16, 2002 11:56 AM
To: Lucene Users List
Subject: Re: CachedSearcher
Scott Ganyo wrote:
I'd like to see the finalize
Deadlocks could be created if the order in which locks are obtained is not
consistent. Note, though, that the locks are obtained in the same order
each time throughout. (BTW: The inner lock is merely needed because the
wait/notify calls need to own the monitor.)
Naturally, you are free to make
-
From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 26, 2002 7:15 PM
To: 'Lucene Users List'
Subject: RE: Stress Testing Lucene
1) Are you sure that the index is corrupted? Maybe the file
handles just
haven't been released yet. Did you try to reboot and try again
1) Are you sure that the index is corrupted? Maybe the file handles just
haven't been released yet. Did you try to reboot and try again?
2) To avoid the too-many files problem: a) increase the system file handle
limits, b) make sure that you reuse IndexReaders as much as you can rather
across
Use the java -Xmx option to increase your heap size.
Scott
-Original Message-
From: Nader S. Henein [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 13, 2002 12:20 PM
To: [EMAIL PROTECTED]
Subject: Boolean Query + Memory Monster
I have 1 Geg of memory on the machine with the
Actually, [] denotes an inclusive range of Terms. Anyway, why not change
the syntax if this is bad...?
Scott
-Original Message-
From: Brian Goetz [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 20, 2002 10:08 AM
To: Lucene Users List
Subject: Re: Queryparser croaking on [ and
+1
-Original Message-
From: Matt Tucker [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 22, 2002 11:06 AM
To: 'Lucene Users List'
Subject: RE: JDK 1.1 vs 1.2+
Hey all,
I'd just like to chime in support for dropping JDK 1.1,
especially if it
would aid i18n in Lucene.
We use Lucene extensively as a core part of our ASP product here at
eTapestry. In fact, we've built our database query engine on top of
it. We have been extremely pleased with the results.
Scott
Jeff Kunkle asks:
Does anyone know of any companies or agencies using Lucene for their
of a BooleanQuery subtract. Sure, it works, but it ain't pretty...
Scott
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 01, 2001 10:49 AM
To: 'Lucene Users List'
Subject: RE: Problems with prohibited BooleanQueries
From: Scott Ganyo
P.S. At one point I tried doing an in-memory index using the
RAMDirectory
and then merging it with an on-disk index and it didn't work. The
RAMDirectory never flushed to disk... leaving me with an
empty index. I
think this is because of a bug in the mechanism that is
supposed
Not sure about the rest, but if you've stored your dates in mmdd format,
you can use a RangeQuery like so:
dateField:[20011001-null]
This would return all dates on or after October 1, 2001.
Scott
-Original Message-
From: W. Eliot Kimber [mailto:[EMAIL PROTECTED]]
Sent: Tuesday,
Thanks for the detailed information, Doug! That helps a lot.
Based on what you've said and on taking a closer look at the code, it looks
like by setting mergeFactor and maxMergeDocs to Integer.MAX_VALUE, an entire
index will be built in a single segment completely in memory (using the
We're having a heck of a time with too many file handles around here. When
we create large indexes, we often get thousands of temporary files in a
given index! Even worse, we just plain run out of file handles--even on
boxes where we've upped the limits as much as we think we can! We've played
38 matches
Mail list logo