Re: Query Tuning

2005-02-21 Thread Paul Elschot
) Would the query run faster? Exchanging the operands of AND would not make a noticeable difference in speed. Queries are evaluated by iterating the inverted term index entries for all query terms in parallel, with buffering. Regards, Paul Elschot

Re: Query Tuning

2005-02-21 Thread Paul Elschot
the buffering for a TermScorer should be made dependent on it's expected use: more buffering for top level OR, less buffering when used under AND. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: Optional Terms in a single query

2005-02-21 Thread Paul Elschot
? (type:1 81) I would really think to do this all in one Query. Is this even possible? How would you want to combine the results? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: Lucene in the Humanities

2005-02-19 Thread Paul Elschot
Erik, On Saturday 19 February 2005 01:33, Erik Hatcher wrote: On Feb 18, 2005, at 6:37 PM, Paul Elschot wrote: On Friday 18 February 2005 21:55, Erik Hatcher wrote: On Feb 18, 2005, at 3:47 PM, Paul Elschot wrote: Erik, Just curious: it would seem easier to use multiple fields

Re: Lucene in the Humanities

2005-02-19 Thread Paul Elschot
On Saturday 19 February 2005 11:02, Erik Hatcher wrote: On Feb 19, 2005, at 3:52 AM, Paul Elschot wrote: By lowercasing the querytext and searching in title_lc ? Well sure, but how about this query: title:Something AND anotherField:someOtherValue QueryParser, as-is, won't

Re: Lucene in the Humanities

2005-02-18 Thread Paul Elschot
Erik, Just curious: it would seem easier to use multiple fields for the original case and lowercase searching. Is there any particular reason you analyzed the documents to multiple indexes instead of multiple fields? Regards, Paul Elschot

Re: Lucene in the Humanities

2005-02-18 Thread Paul Elschot
On Friday 18 February 2005 21:55, Erik Hatcher wrote: On Feb 18, 2005, at 3:47 PM, Paul Elschot wrote: Erik, Just curious: it would seem easier to use multiple fields for the original case and lowercase searching. Is there any particular reason you analyzed the documents to multiple

Re: Multiple Keywords/Keyphrases fields

2005-02-16 Thread Paul Elschot
instance with same name the gap is not needed. Regards, Paul Elschot I hope this is clear! Kinda hard to articulate. Owen Erik On Feb 12, 2005, at 3:08 PM, Owen Densmore wrote: I'm getting a bit more serious about the final form of our lucene index. Each document has

Re: chained restrictive queries

2005-02-14 Thread Paul Elschot
queries. If it is, there is some code in development that might help . In case it turns out that the memory occupied by the BitSet of the filter is a bottleneck, please check the (very) recent archives of lucene-dev on BitSet implementation. Regards, Paul Elschot

Re: Problem searching Field.Keyword field

2005-02-10 Thread Paul Elschot
clauses in query. In the development version this restriction has gone. The limitation of the maximum clause count (default 1024, configurable) is still there. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Searching for doc without a field

2005-02-04 Thread Paul Elschot
On Friday 04 February 2005 17:29, Bill Tschumy wrote: On Feb 4, 2005, at 10:19 AM, Bill Tschumy wrote: On Feb 3, 2005, at 2:04 PM, Paul Elschot wrote: On Thursday 03 February 2005 20:18, Bill Tschumy wrote: Is there any way to construct a query to locate all documents without

Re: Rewrite causes BooleanQuery to loose required terms

2005-02-03 Thread Paul Elschot
not carry the old state forward. The new constructor does carry the new state backward. I'll post a fix in bugzilla later. Thanks, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Searching for doc without a field

2005-02-03 Thread Paul Elschot
names of all (other) indexed fields in the document. Assuming there is always a primary key field the query is then: +fieldnames:primarykeyfield -fieldnames:specificfield Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL

Re: Compile lucene

2005-02-02 Thread Paul Elschot
build.xml. You need to correct the version property in the build.xml file: property name=version value=1.4.3/ Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Compile lucene

2005-02-02 Thread Paul Elschot
beforehand): cvs -d :pserver:[EMAIL PROTECTED]:/home/cvspublic checkout -r lucene_1_4_3 -d lucene-1.4.3 jakarta_lucene In there you can correct the build.xml file and do: ant compile to compile the source code. Regards, Paul Elschot On Wednesday 02 February 2005 20:55, Helen Butler wrote: Hi Paul

Re: Subversion conversion

2005-02-02 Thread Paul Elschot
for the few minutes instead of hours, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Penalty for storing unrelated field?

2005-01-29 Thread Paul Elschot
On Friday 28 January 2005 22:30, Andy Goodell wrote: You should be fine. For search performance, yes. But the extra field data does slow down optimization of a modified index because all the field (and index) data is read and written for that. When the extra data gets bulky, it's normally better

Re: Suggestions for documentation or LIA

2005-01-26 Thread Paul Elschot
: +(synA1 synA2 ) +(synB1 synB2 ...) +(synC1 synC2 ...) the development version of BooleanQuery might be a bit faster than the current one. For an interesting twist in the use of idf please search for fuzzy scoring changes on lucene-dev at the end of 2004. Regards, Paul Elschot

Re: Filtering w/ Multiple Terms

2005-01-24 Thread Paul Elschot
me what I've done wrong? Maybe all query hits were filtered out? Could you compare the docnrs in the bits of the filter with the unfiltered query hits docnrs? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Opening up one large index takes 940M or memory?

2005-01-22 Thread Paul Elschot
. No one has done this yet, so I guess it's prefered to buy RAM instead... Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Span Query Performance

2005-01-06 Thread Paul Elschot
on your operating system, the size of the index, the amount of RAM you can use, the file buffering efficiency, other loads on the computer ... c) Is there a faster method to what I am doing I should consider? Preindexing all word combinations that you're interested in. Regards, Paul Elschot

Re: Span Query Performance

2005-01-06 Thread Paul Elschot
Sorry for the duplicate on lucene-dev, it should have gone to lucene-user directly: A bit more: On Thursday 06 January 2005 10:22, Paul Elschot wrote: On Thursday 06 January 2005 02:17, Andrew Cunningham wrote: Hi all, I'm currently doing a query similar to the following: for w

Re: document boost not showing up in Explanation

2004-12-28 Thread Paul Elschot
a score that is within the range of the change. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Word co-occurrences counts

2004-12-22 Thread Paul Elschot
level search methods. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Relevance percentage

2004-12-22 Thread Paul Elschot
in createWeight(). Then inherit from QueryParser to use the above Query in getBooleanQuery(). Finally use such a query in a search: the document scores will be the coord() values. Regards, Paul Elschot. - To unsubscribe, e-mail

Re: index size doubled?

2004-12-21 Thread Paul Elschot
the name of that segment in the deletable file, so it can try later to delete that segment. This is known behaviour on FAT file systems. These randomly take some time for themselves to finish closing a file after it has been correctly closed by a program. Regards, Paul Elschot

Re: MergerIndex + Searchables

2004-12-21 Thread Paul Elschot
that this relevan Document Id Originated from Which MRG??? [ Some thing like this : - Search word 'ISBN12345' is avalible from MRGx ] I think you are looking for the methods subSearcher() and subDoc() on MultiSearcher. Regards, Paul Elschot

Re: Optimising A Security Filter

2004-12-20 Thread Paul Elschot
only for search results on the query over the whole index. The bit filters generally work well, except when you need a lot of very sparse filters and memory is a concern. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL

Re: Relevance percentage

2004-12-20 Thread Paul Elschot
. Higher powers as above can come a long way, though. Regards, Paul Elschot Thanks, Gururaja Mike Snare [EMAIL PROTECTED] wrote: I'm still new to Lucene, but wouldn't that be the coord()? My understanding is that the coord() is the fraction of the boolean query that matched a given

Re: Permissioning Documents

2004-12-10 Thread Paul Elschot
queries an index after it is opened. Filters can be cached, see the recent discussion on CachingWrappingFilter and friends. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Retrieving all docs in the index

2004-12-09 Thread Paul Elschot
of the primary key field can serve as the constant value. Regards, Paul Elschot -Original Message- From: Aviran [mailto:[EMAIL PROTECTED] Sent: Thursday, December 09, 2004 2:08 PM To: 'Lucene Users List' Subject: RE: Retrieving all docs in the index In this case you'll have to add

Re: restricting search result

2004-12-04 Thread Paul Elschot
Paul, On Friday 03 December 2004 23:31, you wrote: Hi, how yould you restrict the search results for a certain user? I'm One way to restrict results is by using a Filter. indexing all the existing data in my application but there are certain access levels so some users should see more

Re: restricting search result

2004-12-04 Thread Paul Elschot
might also be used to reduce the I/O for searching, but Lucene doesn't do that now, probably because there was little to gain. Regards, Paul Elschot. P.S. The code doing the filtering is in IndexSearcher.java, from line 97

Re: IndexWriter.optimize and memory usage

2004-12-03 Thread Paul Elschot
On Friday 03 December 2004 08:43, Paul Elschot wrote: On Friday 03 December 2004 07:50, Chris Hostetter wrote: ... So, If I'm understanding you (and the javadocs) correctly, the real key here is maxMergeDocs.  It seems like addDocument will never merge a segment untill maxMergeDocs have

Re: Does Lucene perform ranking in the retrieved set?

2004-11-30 Thread Paul Elschot
the DefaultSimilarity. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: URGENT: Help indexing large document set

2004-11-24 Thread Paul Elschot
with multiple threads. Last time I checked that, there was a moderate speed up using three threads instead of one on a single CPU machine. Tuning the values of minMergeDocs and maxMergeDocs may also help to increase performance of adding documents. Regards, Paul Elschot

Re: lucene Scorers

2004-11-24 Thread Paul Elschot
of the parallel class hierarchy. That could also involve some sort of accrual scorer and Lucene's Similarity. Regards, Paul Elschot -Ken On Sat, 13 Nov 2004 12:07:05 +0100, Paul Elschot [EMAIL PROTECTED] wrote: On Friday 12 November 2004 22:56, Chuck Williams wrote: I had a similar need

Re: Numeric Range Restrictions: Queries vs Filters

2004-11-23 Thread Paul Elschot
Chris, On Tuesday 23 November 2004 03:25, Hoss wrote: (NOTE: numbers in [] indicate Footnotes) I'm rather new to Lucene (and this list), so if I'm grossly misunderstanding things, forgive me. One of my main needs as I investigate Search technologies is to restrict results based on Ranges

Re: Using multiple analysers within a query

2004-11-22 Thread Paul Elschot
On Monday 22 November 2004 05:02, Kauler, Leto S wrote: Hi Lucene list, We have the need for analysed and 'not analysed/not tokenised' clauses within one query. Imagine an unparsed query like: +title:Hello World +path:Resources\Live\1 In the above example we would want the first clause

Re: Lucene and SVD

2004-11-18 Thread Paul Elschot
-space. Does anyone work on a project like this? I don't know. Is there a good SVD package for Java? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: boolean/set operations on lucene queries

2004-11-18 Thread Paul Elschot
On Thursday 18 November 2004 16:57, Rupinder Singh Mazara wrote: hi all I needed some help in solving the following problem a user executes query1 and query2 both the queries( not result sets ) get stored, over time the user wants to find which documents from query1 are common to

Re: Need help with filtering

2004-11-17 Thread Paul Elschot
On Wednesday 17 November 2004 01:20, Edwin Tang wrote: Hello, I have been using DateFilter to limit my search results to a certain date range. I am now asked to replace this filter with one where my search results have document IDs greater than a given document ID. This document ID is

Re: COUNT SUBINDEX [IN MERGERINDEX]

2004-11-17 Thread Paul Elschot
On Wednesday 17 November 2004 07:10, Karthik N S wrote: Hi guy's Apologies. So A Mergeed Index is again a Single [ addition of subIndexes... ), If that case , If One of the Field Types is of type 'Field.Keyword' whic is Unique across the subIndexes [Before Merging].

Re: BooleanQuery - TooManyClauses Issue

2004-11-16 Thread Paul Elschot
want to use a filter for the dates. See DateFilter and the archives on MMDD. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene Scorers

2004-11-13 Thread Paul Elschot
On Friday 12 November 2004 22:56, Chuck Williams wrote: I had a similar need and wrote MaxDisjunctionQuery and MaxDisjunctionScorer. Unfortunately these are not available as a patch but I've included the original message below that has the code (modulo line breaks added by simple text email

Re: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-13 Thread Paul Elschot
matching exactly one character. I think it would be better encourage the users to use longer and maybe also more prefixes. This gives more precise results and is more efficient to execute. Regards, Paul Elschot - To unsubscribe, e

Re: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-12 Thread Paul Elschot
get what they pay for. Imposing a minimum prefix length can be done by overriding the method in QueryParser that provides a prefix query. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands

Re: lucene Scorers

2004-11-12 Thread Paul Elschot
code. When you need more info on this, try lucene-dev. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query#rewrite Question

2004-11-11 Thread Paul Elschot
, but that is difficult to express in the current query syntax. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: What is the difference between these searches?

2004-11-09 Thread Paul Elschot
prefix is required. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: can lucene be backed to have an update field

2004-11-09 Thread Paul Elschot
the value efficiently. The only updates available are on the field norms. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: What is the difference between these searches?

2004-11-09 Thread Paul Elschot
On Tuesday 09 November 2004 23:14, Luke Francl wrote: On Tue, 2004-11-09 at 16:00, Paul Elschot wrote: Lucene has no provision for matching by being prohibited only. This can be achieved by indexing something for each document that can be used in queries to match always, combined

Re: Search speed

2004-11-02 Thread Paul Elschot
there are more options like using faster disks and/or using RAM for critical parts of the index. Lucene can use extra RAM in various ways. To configure that one may have to do some java coding. Profiling can guide you there. Regards, Paul Elschot

Re: Search speed

2004-11-02 Thread Paul Elschot
, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: When do document ids change

2004-10-29 Thread Paul Elschot
when the documentID is created? To know the docId use an indexed primary key in lucene and search for it using IndexReader.termDocs(new Term(keyField, keyValue)). Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: threading and indexing......

2004-10-16 Thread Paul Elschot
% iirc. More threads were of no use for me in that case. Regards, Paul Elschot Otis --- Chris Fraschetti [EMAIL PROTECTED] wrote: if i have four threads all trying to call my index function, will lucene do what is necessary for each thread to wait until the writer is available

Re: sorting and score ordering

2004-10-13 Thread Paul Elschot
to combine the other two to provide the search results, usually a BooleanScorer or a ConjunctionScorer. For proximity queries, other scorers are used. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Special field values

2004-10-12 Thread Paul Elschot
the {0,1} values for? Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Special field values

2004-10-12 Thread Paul Elschot
On Tuesday 12 October 2004 19:27, Paul Elschot wrote: IndexReader.open(indexName).termDocs(new Term(term, field)).skipTo(documentNr) returns the boolean indicating that. Well, almost. When it returns true one still needs to check the TermDocs for being at the documentNr. Paul Elschot

Re: How to pull document scoring values

2004-09-29 Thread Paul Elschot
of the query term weights would have the query weights directly apllied to the the query term density in the document field, whereas now the weights seem to be applied to the square root of the density. The density value is an approximation, see above for the rough field norms. Regards, Paul Elschot

Re: How to pull document scoring values

2004-09-29 Thread Paul Elschot
. The encoding/decoding is somewhat rough, though. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: WildCardQuery

2004-09-21 Thread Paul Elschot
internally, a maximum was introduced to avoid running out of memory. You can change the maximum nr of added clauses using BooleanQuery.setMaxClauseCount() but then it is advisable to monitor memory usage, and evt. increase heap space for the JVM. Regards, Paul Elschot

Re: displaying 'pages' of search results...

2004-09-21 Thread Paul Elschot
that lucene doesn't need to search again, or would the search be cached and no delay arise? Just looking for some ideas and possibly some implementational issues... Lucene's Hits class is designed for paging through search results. In which order would you need the 1.000.000 results? Regards, Paul

Re: Too many boolean clauses

2004-09-20 Thread Paul Elschot
you doc ids instead of over dates. This will give you a filter for the doc ids you want to query. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Similarity scores: tf(), lengthNorm(), sumOfSquaredWeights().

2004-09-20 Thread Paul Elschot
to the Lucene formula without coord(). Regards, Paul Elschot On Tuesday 14 September 2004 23:49, Doug Cutting wrote: Your analysis sounds correct. At base, a weight is a normalized tf*idf. So a document weight is: docTf * idf * docNorm and a query weight is: queryTf * idf * queryNorm

Re: Too many boolean clauses

2004-09-20 Thread Paul Elschot
On Monday 20 September 2004 20:54, Shawn Konopinsky wrote: Hey Paul, Thanks for the quick reply. Excuse my ignorance, but what do I do with the generated BitSet? You can return it in in the bits() method of the object implementing your org.apache.lucene.search.Filter

Re: Too many boolean clauses

2004-09-20 Thread Paul Elschot
for this particular search, where all other searches use the pool. Suggestions? You could use a map from the IndexSearcher back to the IndexReader that was used to create it. (It's a bit of a waste because the IndexSearcher has a reader attribute internally.) Regards, Paul Elschot

Re: Build problems

2004-09-03 Thread Paul Elschot
on lucene-dev. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using 2nd Index to constraing Search

2004-08-27 Thread Paul Elschot
the document table. The reason I want to do this is to reduce the numbers of documents that the full text query will run. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Question concerning speed of Lucene.....

2004-08-27 Thread Paul Elschot
optimisations (needed after adding/deleting docs) copy all data so it pays to keep the Lucene indexes small. Later you might need multiple indexes, MultiSearcher, and occasionally a merge of the indexes. Regards, Paul Elschot

Re: How not to show results with the same score?

2004-08-25 Thread Paul Elschot
to crawl and index an intranet or more, have a look at Nutch. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index Size

2004-08-19 Thread Paul Elschot
then see the total disk size of for example the stored fields. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance when computing computing a filter using hundreds of diff terms.

2004-08-06 Thread Paul Elschot
Kevin, On Thursday 05 August 2004 23:32, Kevin A. Burton wrote: I'm trying to compute a filter to match documents in our index by a set of terms. For example some documents have a given field 'category' so I need to compute a filter with mulitple categories. The problem is that our

Re: Question on number of fields in a document.

2004-08-04 Thread Paul Elschot
On Wednesday 04 August 2004 18:22, John Z wrote: Hi I had a question related to number of fields in a document. Is there any limit to the number of fields you can have in an index. We have around 25-30 fields per document at present, about 6 are keywords, Around 6 stored, but not indexed

Re: Caching of TermDocs

2004-07-26 Thread Paul Elschot
On Monday 26 July 2004 21:41, John Patterson wrote: Is there any way to cache TermDocs? Is this a good idea? Lucene does this internally by buffering up to 32 document numbers in advance for a query Term. You can view the details here in case you're interested:

Syntax for query parsers

2004-06-09 Thread Paul Elschot
fit all, I don't think) and what syntax should be used. Paul Elschot created a surround query parser that he posted about to the list in April. Erik Here is a bit about the syntax for Surround (mostly taken from the posted tgz file). Basically one has to use an operator for everything

Re: incomplete word match

2004-03-11 Thread Paul Elschot
On Thursday 11 March 2004 06:15, Tomcat Programmer wrote: I have a situation where I need to be able to find incomplete word matches, for example a search for the string 'ape' would return matches for 'grapes' 'naples' 'staples' etc. I have been searching the archives of this user list and