Re: Sort Performance Problems across large dataset

2005-01-24 Thread Matt Quail
Peter, Currently we can issue a simple search query and expect a response back in about 0.2 seconds (~3,000 results) You may want to try something like the following (I do this in FishEye, seems to be performant for moderately large field-spaces). Use a custom HitCollector, and store all the

Re: How to handle range queries over large ranges and avoid Too Many Boolean clauses

2004-05-18 Thread Matt Quail
Is there a simpler, easier way to do this? Yes. I have started implementing a QuickRangeQuery class, that doesn't have the BooleanQuery limitation, but scores every matching document as 1.0. I will see if I can get it finished in the next 24 hours, and post back to this thread. =Matt PS: I'm

Re: hierarchical search

2004-05-17 Thread Matt Quail
Fredrik, I would tackle your problem like this: Say that that field you want to index is path. I would turn this into *three* indexed fields: 1) multiple path prefixes (pre-paths) 2) multiple path suffixes (post-paths) 3) the number of components in the path (path-size). For example, for a path of

Re: Memory Requirements

2004-05-13 Thread Matt Quail
I noticed that most users have +- 1G of RAM to run Lucene. Does anyone have experiences running it on a 128MB or 256MB machine? I regularly test my app that uses Lucene by passing -Xmx8m to the JVM; this is on a box with 1G of ram, but the JVM never more than 8M. My app runs fine (though there

Re: Memory Requirements

2004-05-13 Thread Matt Quail
(and a BitSet) for queries where I am not interested in the score. Apart from that, I'm not aware of any other methods for reducing the memory consumption. =Matt Sascha Ottolski wrote: Am Donnerstag, 13. Mai 2004 12:56 schrieb Matt Quail: I noticed that most users have +- 1G of RAM to run Lucene

Re: Mixing database and lucene searches

2004-05-11 Thread Matt Quail
Eric Jain wrote: To ask a silly question: What approach does Lucene use for ranges and sorting? A range such as '10-60' is expanded into a boolean query containing all terms that are in the index and lie within the specified range, e.g. '10 or 11 or 20 or 59'. Yes, using a range search requires

Re: Mixing database and lucene searches

2004-05-10 Thread Matt Quail
Glen Stampoultzis wrote: Anyone have any strategies for dealing with this? I'm wondering whether it's better to replicate searchable fields in the lucene index. This means being very careful that updates get done in two places so it is not ideal. If you *can* manage to update your index when the

Re: Range searches for numbers

2004-05-06 Thread Matt Quail
Reece, What's the best way to store numbers for range searching? If someone has some info about this I'd love to see it. I implemented a LongField that encodes any +ve or -ve long into a string that sorts correctly. I posted that class here: http://www.mail-archive.com/[EMAIL

Re: Presentation in Mtl

2004-04-14 Thread Matt Quail
I too gave a Lucene presentation to my local JUG (Canberra, Australia) last night. It also went over very well. Lucene totally rocks! =Matt Stephane James Vaucher wrote: Hi everyone, I did a presentation tonight in Montreal at a java users group metting. I've got to say that they were maybe 4

Re: code works with 1.3-rc1 but not with 1.3-final??

2004-03-22 Thread Matt Quail
Or use IndexWriter.setUseCompundFile(true) to reduce the number of files created by Lucene. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#setUseCompoundFile(boolean) =Matt Kevin A. Burton wrote: Dan wrote: I have some code that creates a lucene index. It

Re: PrefixQuery and hieracical queries problem

2004-03-19 Thread Matt Quail
* * @author Matt Quail (http://madbean.com/) */ public class HieracicalTreeExample { private static final Analyzer ANALYZER = new WhitespaceAnalyzer(); private static final File sIndexDir = new File(d:/tmp/store); public static void main(String[] args) throws Exception { File treeRoot

Re: java.io.tmpdir as lock dir .... once again

2004-03-02 Thread Matt Quail
I had to do something similar to make the application works with lucene 1.3 final when upgrading from 1.3 RC1. I think it is better to maintain back compatiable so existing users are not affected too much when a new release is available. I'd like to me too this sentiment. That change caused me a

Re: Iterating TermEnum backwards

2004-02-26 Thread Matt Quail
I know I could invert my dates (something like MAX_LONG - date) to get the REVERSE order, but I want to be able to do least recent and most recent. Why not have two date fields, one inverted and one not? PS: my current solution is to do a binary search between MIN and MAX, halving my search

Iterating TermEnum backwards

2004-02-25 Thread Matt Quail
Hi all, Is there any way to iterate through a TermEnum backwards? Okay, I know that there isn't a way to do this via the TermEnum class, but is it implementable on top of the underlying Lucene datastore? My particular problem is this: I have an index of documents, each document has a date field

1.3-final: now giving me java.io.FileNotFoundException (Too many open files)

2004-01-21 Thread Matt Quail
I'm getting the following stack trace from lucene-1.3-final running on JDK 1.4.2_03-b02 on linux java.io.FileNotFoundException: /home/matt/blah/idx/_123n.tis (Too many open files) at java.io.RandomAccessFile.open(Native Method) at