Advise regarding drawing search

2011-07-08 Thread Evert Wagenaar
Hello all, Me and my collegues would like some advise on the following; We have a really large amount of technical drawings (about 300 Million) which we want to make searchable. An OCR library has already been developed which is able to identify all components which may be available on a drawing,

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-08 Thread Jahangir Anwari
After applying the patch I was able to get the span positions for all the terms in the query. But now when I tried to access the positionSpans of each span term I cannot because they are stored in a package-private PositionSpan class in WeightedSpanTerm.java which prevents them from being visible o

Re: # as a special character?

2011-07-08 Thread Ian Lea
Searching for special characters can be a pain. There is a message thread from this list called "Lucene Analyzer that can handle C++ vs C#" that might help. -- Ian. On Wed, Jul 6, 2011 at 4:19 PM, Aradon Strider wrote: > Hello, > >  First off I am using the QueryParser with the standardanaly

Large indexes

2011-07-08 Thread Chris Bamford
Hi I was wondering how to improve search performance over a set of indexes like this: 27GK1-1/index 19GK1-2/index 24GK1-3/index 15GK1-4/index 19GK1-5/index 31GK2-1/index 16GK2-2/index 8.1G K2-3/index 12GK2-4/index 15GK2-5/index In total it is

Re: Large indexes

2011-07-08 Thread Ian Lea
There are lots of general tips at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. What version of lucene? Recent releases should be faster. Have you tried with one big index? If everything is running on the same server that may well be faster. Even on single indexes, response of a few

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-08 Thread Mark Miller
On Jul 8, 2011, at 5:43 AM, Jahangir Anwari wrote: > I don't think this is the best > solution, am open to other alternatives. Could also make it static public where it is? Either way. - Mark Miller lucidimagination.com

Re: Large indexes

2011-07-08 Thread Simon Willnauer
On Fri, Jul 8, 2011 at 4:50 PM, Ian Lea wrote: > There are lots of general tips at > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed. > > What version of lucene?  Recent releases should be faster. Have you > tried with one big index? If everything is running on the same server > that may

Re: Large indexes

2011-07-08 Thread Erick Erickson
Simply breaking up your index into separate pieces on the same machine buys you nothing, in fact it costs you considerably. Have you put a profiler on the system to see what's happening? I expect you're swapping all over the place and are memory-constrained. Have you considered sharding your index