Re: Computing Relevancy Differently

2003-02-10 Thread Doug Cutting
Terry Steichen wrote: Can you give me an idea of what to replace the lengthNorm() method with to, for example, remove any special weight given to shorter matching documents? The goal of the default implementation is not to give any special weight to shorter documents, but rather to remove the a

Re: Computing Relevancy Differently

2003-02-07 Thread Doug Cutting
Terry Steichen wrote: I read all the relevant references I could find in the Users (not Developers) list, and I still don't exactly know what to do. What I'd like to do is get a relevancy-based order in which (a) longer documents tend to get more weight than shorter ones, (b) a document body with

Re: how to join 2 queries togther

2003-01-20 Thread Doug Cutting
Do you want hits to contain the word "words" or not? You've got it in both clauses... Also, "+(a b c)" requires that any of "a" "b" or "c" be in a document, but not necessarily all of them. If you want it to contain all of them then each term must be required, e.g., "+a +b +c". In the latest

Re: read past EOF?

2003-01-08 Thread Doug Cutting
petite_abeille wrote: On Tuesday, Jan 7, 2003, at 22:46 Europe/Zurich, Doug Cutting wrote: This could happen if Lucene's file locking is disabled or broken. [ ... ] File locking is known to be broken over NFS, and wasn't even present in early versions of Lucene. Are you using a

Re: Bad file descriptor?

2003-01-08 Thread Doug Cutting
My guess would be that you're using an IndexReader that has been closed. Doug petite_abeille wrote: Hello, Here is another symptom of misbehavior in Lucene: java.io.IOException: Bad file descriptor at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile

Re: read past EOF?

2003-01-07 Thread Doug Cutting
It looks like the .fdx and one of the .f[0-9]* files are out of sync. The .fdx file for each segment should be exactly eight times as long as all of the .f[0-9] files for that segment. This could happen if Lucene's file locking is disabled or broken. What version of Lucene are you using? What

Re: Lucene and thread safety

2003-01-06 Thread Doug Cutting
Lucene is thread and process safe. An IndexReader, once opened, always reflects the same state of the index. To see changes made by another thread or process you must open a new IndexReader. Doug Joe Consumer wrote: I read a while back that Lucene is not thread safe. That was in the FAQ on L

Re: Optimization Question

2003-01-06 Thread Doug Cutting
It should always be safe to search an index, even while optimizing. Harpreet S Walia wrote: Hi, I am using lucene on windows and have the following query abt optimization. Is it safe to search if a optimize process is going on . i found a reference of this in the archives which said that on uni

Re: QueryParser question

2002-12-31 Thread Doug Cutting
Doug Cutting wrote: However, in most cases where this is an issue, the real problem is that folks are placing too much reliance on the query parser. The query parser is designed for user-entered queries. If you're programmatically generating query strings that are then fed to the

Re: QueryParser question

2002-12-31 Thread Doug Cutting
Erik Hatcher wrote: I'd like to revisit this issue. First, I add the path field to the Document in this way: doc.add(Field.Keyword("path", path)); This field is, of course, not tokenized by the Analyzer, right? So shouldn't QueryParser take this fact into account on a field-by-field bas

Re: Incomprehensible (to me) tokenizing behavior

2002-12-30 Thread Doug Cutting
Terry Steichen wrote: > PS: Is this kind of thing (and more importantly, any other similar > design issues) documented any place? This one is described in the source code, with the comment: // floating point, serial, model numbers, ip addresses, etc. // every other segment must have at least

Re: Incomprehensible (to me) tokenizing behavior

2002-12-30 Thread Doug Cutting
Terry Steichen wrote: I tested StandardAnalyzer (which uses StandardTokenizer) by inputing the a set of strings which produced the following results: "aa/bb/cc/dd" was tokenized into 4 terms: aa, bb, cc, dd "aa/bb/cc/d1" was tokenized into 3 terms: aa, bb, cc/d1 "aa/bb/c1/dd" was tokenized into

Re: How to obtain unique field values

2002-12-30 Thread Doug Cutting
Erik Hatcher wrote: Is it possible for me to retrieve all the values of a particular field that exists within an index, across all documents? For example, I'm indexing documents that have a "category" associated with them. Several documents will share the same category. I'd like to be able t

Re: Lucene Benchmarks and Information

2002-12-20 Thread Doug Cutting
petite_abeille wrote: On Friday, Dec 20, 2002, at 19:58 Europe/Zurich, Scott Ganyo wrote: FYI: The best thing I've found for both increasing speed and reducing file handles is to use an IndexWriter on a RamDirectory for indexing and then use FileWriter.addIndexes() to write the result to disk.

Re: write.lock file

2002-12-20 Thread Doug Cutting
petite_abeille wrote: On Tuesday, Dec 17, 2002, at 17:43 Europe/Zurich, Doug Cutting wrote: Index updates are atomic, so it is very unlikely that the index is corrupted, unless the underlying file system itself is corrupted. Ummm... Perhaps in theory... In practice, indexes seems to get

Re: Lucene Benchmarks and Information

2002-12-20 Thread Doug Cutting
Armbrust, Daniel C. wrote: While I was trying to build this index, the biggest limitation of Lucene that I ran into was optimization. Optimization kills the indexers performance when you get between 3-5 million documents in an index. On my Windows XP box, I had to reoptimize every 100,000 docume

Re: write.lock file

2002-12-17 Thread Doug Cutting
Sale, Doug wrote: it depends on what you mean by corrupt. i think there are 3 cases: 1) the process died during a non-writing action (woo-hoo!) 2) the process died during a user-writing action (building a document) 3) the process died during a system-writing action (writing an index file) i do

Re: Empty phrase search

2002-12-17 Thread Doug Cutting
I believe that the underlying search and indexing code should correctly handle terms with zero-length text, although I have never tested this. However I know of no query parser syntax to generate such terms in a query. But it should work to use them in a manually constructed query. Doug Minh

Re: Indexing in a CBD Environment

2002-12-10 Thread Doug Cutting
I'm not sure I understand the question, but I'll hazard an answer anyway. Might it work to maintain separate indexes for B, C, E and F, then use a MultiSearcher to search them all? That would keep updates local... Doug Cohan, Sean wrote: I am a total newbie to Lucene. We are developing usin

Re: Keyword fields which don't contribute to a document's score?

2002-12-06 Thread Doug Cutting
In the pre-release version available in the nightly builds you can boost document fields at index time. Check out the CHANGES.txt file for details. Doug Ashley Collins wrote: Is it possible to stop keyword fields contributing to a document's score? Leaving only text fields? Is the best way t

Re: Incremental indexing

2002-12-06 Thread Doug Cutting
Eric Jain wrote: Currently, I use the following procedure to update an index incrementally: 1. Build document 2. Open index reader 3. Delete any previous version of the document using a key field 4. Close index reader 5. Open index writer 6. Add document to index 7. Cl

Re: How does delete work?

2002-11-23 Thread Doug Cutting
Clemens Marschner wrote: So what if documents are deleted in the meantime? Then the recursive merge can't determine the X segments with the same size. If you read my previous message you'll find the answer: Doug Cutting wrote: > It's actually a little more complicated than

Re: How does delete work?

2002-11-22 Thread Doug Cutting
0. Is this right? Thanks, Otis --- Doug Cutting <[EMAIL PROTECTED]> wrote: Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment stack. Whenever there are mergeFactor segments on the top of the stack that ar

Re: How does delete work?

2002-11-22 Thread Doug Cutting
te: This is via mergeFactor? --- Doug Cutting <[EMAIL PROTECTED]> wrote: The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It

Re: Updating documents

2002-11-22 Thread Doug Cutting
A deletion is only visible in other IndexReader instances created after the IndexReader where you made the deletion is closed. So if you're searching using a different IndexReader, you need to re-open it after the deleting IndexReader is closed. The lastModified method helps you to figure out

Re: How does delete work?

2002-11-22 Thread Doug Cutting
The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen as more documents are added to the index, without optimization. Scott Ganyo wrote: It just marks the record as deleted. The record isn't actually removed until t

Re: Stress/scalability testing Lucene

2002-11-20 Thread Doug Cutting
writing at the same time? I thought I read this in the FAQ. Roy. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 5:04 PM To: Lucene Users List Subject: Re: Stress/scalability testing Lucene * Replies will be sent through Spamex to [EMAIL

Re: Stress/scalability testing Lucene

2002-11-20 Thread Doug Cutting
Justin Greene wrote: We created a thread pool to read and parse the email messages. 10 threads seems to be the magic number here for us. We then created a queue of messages to be indexed onto which we push the parsed messages and have a single thread adding messages to the index. IndexWriter.a

Re: extracting top k frequently occuring terms from a given set ofdocuments

2002-11-15 Thread Doug Cutting
There was a class in the test directory that efficiently computed this, but I think Otis recently removed it. Perhaps it should be revived and go in the sandbox or something... http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@;jakarta.apache.org&msgNo=2620 Doug Vinay Kakade wro

Re: How to get all field names

2002-11-12 Thread Doug Cutting
This would not be hard to implement. It would take something like: public abstract String[] IndexReader.getFieldNames(); This would need to be implemented in two classes, SegmentReader and SegmentsReader. The former would just access its fieldInfos field to list fields. The latter would iter

Re: Searching Ranges

2002-11-12 Thread Doug Cutting
Isn't the break on line 162 of RangeQuery.java supposed to achieve this? Alex Winston wrote: otis, i was able to fix the junit build problems, with the newest versions of ant in regards to lucene unit tests. it appears that the junit.jar must appear in the $ANT_HOME/lib dir in order to run such

Re: Mushrooming Index Files

2002-11-12 Thread Doug Cutting
My guess is that you have around 40 fields. Each field requires a separate file in each segment. Can you combine any of your fields? Terry Steichen wrote: I need to modify my original issue below. I was in error - the optimization does indeed bring the total number of index files back to 46.

Re: has this exception been seen before

2002-11-12 Thread Doug Cutting
A self-contained, reproducible test case is required before someone can really start looking at it. What is the history of this index? Have attempts to update it ever failed prior to this? Doug Avi Drissman wrote: At 8:56 AM -0400 9/20/02, you wrote: Because of this problem, this issue ha

Re: Deleting fields from a Document

2002-11-12 Thread Doug Cutting
Kelvin Tan wrote: Does an in-memory Field guarantee access to its name and value? Say I retrieve a Field from a Document A, and add it to a new Document B. Before writing B to the index, I delete A. Would B still contain the Field? If so, does it work for both String-based and Reader-based val

Re: Several fields with the same name

2002-11-06 Thread Doug Cutting
Right. Use the fields() iterator to scan for multiple Field instances with the same name(). Doug Rob Outar wrote: Would the solution be to call Document.fields(), iterate through that enum and get my data? Thanks, Rob -Original Message- From: Rob Outar [mailto:routar@;ideorlando.or

Re: hit scoring on latest build

2002-11-04 Thread Doug Cutting
If you check the CHANGES file for changes made since the 1.2 release, you'll find: Added support for boosting the score of documents and fields via the new methods Document.setBoost(float) and Field.setBoost(float). Note: This changes the encoding of an indexed value. Indexes should

Re: Deleting fields from a Document

2002-11-04 Thread Doug Cutting
Kelvin Tan wrote: Document maintains a linked list of Fields. It would be not be difficult to delete a random Field, albeit a little inefficient. That would delete it from the in-memory representation, but, once it has been indexed, there is no easy way to remove a field value from a document

Re: Enabling URL-based read access to the search index

2002-10-16 Thread Doug Cutting
Schaeffer, David wrote: > I am planning to upgrade from Lucene 1.0 to Jakarta Lucene 1.2. My current >implementation uses Jason Pell's URLDirectory class so that Lucene can access the >search index while running in an applet. I modified IndexReader.java to use >URLDirectory instead of FSDir

Re: retrieving term positions during the search process

2002-10-16 Thread Doug Cutting
Stephan Grimm wrote: > Is there a way to retrieve the original term positions during the search > process invoked by Searcher.search()? In addition to the documents and their > scores we want to have access to the positions of the terms found in order > to do a highlighting. We don't want to perfo

Re: Question: using boost for sorting

2002-10-16 Thread Doug Cutting
This looks like a good approach. When I get a chance, I'd like to make Similarity an interface or an abstract class, whose default implementation would do what the current class does, but whose methods can be overridden. Then I'd add methods like: public static void Similarity.setDefaultS

Re: Deleting a document found in a search

2002-10-09 Thread Doug Cutting
[EMAIL PROTECTED] wrote: > My first thought is to > define a Field.Keyword("composite-key", domain + "\u" + id). This > would allow me to use the delete(Term) interface to delete the key. That sounds like a good way to solve this. You could also use a HitCollector with a Query, but I think

Re: 1.2 source jar incomplete?

2002-10-03 Thread Doug Cutting
Ype Kingma wrote: > I extracted again, and found my problem: > One of the extracted files is lucene-1.2-src.jar. When unzipping this you > get a directory tree with only the directories mentioned. As I recall, this jar contains only those java "source" files that are generated by JavaCC. I don'

Re: Problems with exact matces on non-tokenized fields...

2002-09-27 Thread Doug Cutting
lex Murzaku wrote: > I was trying this as well but now I get something I can't understand: > My query (Query: +element:POST +nr:3) is supposed to match only one > record. Indeed Lucene returns that record with the highest score but it > also returns others that shouldn't be there at all even if it

Re: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread Doug Cutting
karl øie wrote: > I have a Lucene Document with a field named "element" which is stored > and indexed but not tokenized. The value of the field is "POST" > (uppercase). But the only way i can match the field is by entering > "element:POST?" or "element:POST*" in the QueryParser class. There ar

Re: Is Lucene suitable for one-time index and one-time search ?

2002-09-23 Thread Doug Cutting
Mailing Lists Account wrote: > In effect, what I am trying to do is 'Find in a File(s)' but with one or > more terms( containing AND/OR/phrases as operators.) > > It appears to me that the lucene as all pieces to solve this. That is, > extract the terms, index the document and run the query to se

Re: Is Lucene suitable for one-time index and one-time search ?

2002-09-21 Thread Doug Cutting
Mailing Lists Account wrote: > I need to search a bunch of documents.Each document needs to be searched > only once. That means once I build the index and search it, I have no need > for that index and the document again. This does not sound like the problem that Lucene is designed to solve. Luc

Re: GoogleQueryParser

2002-09-12 Thread Doug Cutting
Halácsy Péter wrote: > To be exact it's not a bug, it's feature ;) Well, the structured query language of >Lucene (and Google and others) is not a strict boolean language. For example I think >the QueryParser of Lucene do not support parenthesis: a AND (b OR C) Lucene's query parser does suppo

Re: Lucene's Ranking Function

2002-09-11 Thread Doug Cutting
Clemens Marschner wrote: > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) > * coord_q_d > > One last thing I wondered about: Is idf_t really going into that equation > twice? Yes. I think that's normal with tf/idf vector-space ranking methods. > From what I see, id

Re: Lucene's Ranking Function

2002-09-11 Thread Doug Cutting
Clemens Marschner wrote: > 1. I think the new document boost is missing, isn't it? > With that it should be something like > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) > * coord_q_d * boost_d > Is that correct? Almost. This should actually be boost_d * boost_d

Re: Full List of Stop Words for Standard Analyzer.

2002-08-02 Thread Doug Cutting
Ian Lea wrote: > In org/apache/lucene/analysis/standard/StandardAnalyzer.java. The source code for the current release is also on the website. In particular, this file is available as: http://jakarta.apache.org/lucene/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java Doug

Re: Deleting Problem

2002-08-01 Thread Doug Cutting
Terry Steichen wrote: > fine now. (I thought I read someplace that you didn't have to optimize after > a delete, but if I don't, it doesn't seem to work.) You don't need to optimize after delete for search results to be correct. However IndexReader.docFreq() may be incorrect until you've optim

Re: Using Filters in Lucene

2002-07-29 Thread Doug Cutting
Peter Carlson wrote: > Would you suggest that search in selection type functionality use filters or > redo the search with an AND clause? I'm not sure I fully understand the question. If you a condition that is likely to re-occur commonly in subsequent queries, then using a Filter which caches

Re: Numeric Support

2002-07-26 Thread Doug Cutting
Armbrust, Daniel C. wrote: > I don't know what a "good" numbers implementation is, but the way that I do it now, >with filters on the bit set after they come back just feels like a hack. Even if bit >sets are very fast, it doesn't seem right to iterate over nearly the entire set of >terms to f

Re: Fields support

2002-07-25 Thread Doug Cutting
Armbrust, Daniel C. wrote: > I never have gotten any response to my question of why is there no native numeric >support in lucene - (Is it really hard, full redesign required, or has it just not >been done [and if it just hasn't been done when might it be done]) It would require a substantial r

Re: Modifying scores

2002-07-23 Thread Doug Cutting
Mike Tinnes wrote: > I'm trying to implement a HITS/PageRank type algorithm and need to modify > the document scores after a search is performed. The final score will be a > combination of the lucene score and PageRank. Is there currently a way to > modify the scores on the fly via HitCollector? s

Re: CachedSearcher

2002-07-17 Thread Doug Cutting
Halácsy Péter wrote: > I made an IndexReaderCache class from the code you have sent (the code in >demo/Search.jhtml). > But this causes exception: > IndexSearcher searcher = new IndexSearcher(cache.getReader("/data/index")); > searcher.close(); > > > searcher = new IndexSearcher(cache.getReader

Re: CachedSearcher

2002-07-16 Thread Doug Cutting
Hang Li wrote: > Why there are so many final and package-protected methods? The package private stuff was motivated by Javadoc. When I wrote Lucene I wanted the Javadoc to make it easy to use. Thus I did not want the Javadoc cluttered with lots of methods that 99% of users did not need to kno

Re: CachedSearcher

2002-07-16 Thread Doug Cutting
Scott Ganyo wrote: > I'd like to see the finalize() methods removed from Lucene entirely. In a > system with heavy load and lots of gc, using finalize() causes problems. > [ ... ] > External resources (i.e. file handles) are not released until the reader > is closed. And, as many have found, L

Re: CachedSearcher

2002-07-16 Thread Doug Cutting
Kelvin Tan wrote: > If the object has a close() method with public modifier, isn't it a common > idiom that client code needs to invoke close() explicitly? If there's no > real need to call close, maybe it can be changed to protected? Yes, that is a common idiom. In the case of Lucene's FSDire

Re: CachedSearcher

2002-07-15 Thread Doug Cutting
Halácsy Péter wrote: > A lot of people requested a code to cache opened Searcher objects until the index is >not modified. The first version of this was writed by Scott Ganyo and submitted as >IndexAccessControl to the list. > > Now I've decoupled the logic that is needed to manage searher. >

Re: Crash / Recovery Scenario

2002-07-10 Thread Doug Cutting
Karl Øie wrote: > A better solution would be to hack the FSDirectory to store each file it would > store in a file-directory as a serialized byte array in a blob of a sql > table. This would increase performance because the whole Directory don't have > to change each time, and it doesn't have

Re: Crash / Recovery Scenario

2002-07-10 Thread Doug Cutting
Karl Øie wrote: > If a crash happends during writing happens there is no good way to know if the > index is intact, removing lock files doesn't help this fact, as we really > don't know. So providing rollback functionality is a good but expensive way > of compensating for lack of recovery. The

Re: Stress Testing Lucene

2002-06-27 Thread Doug Cutting
It's very hard to leave an index in a bad state. Updating the "segments" file atomically updates the index. So the only way to corrupt things is to only partly update the segments file. But that too is hard, since it's first written to a temporary file, which is then renamed "segments". Th

Re: Weighted index

2002-06-24 Thread Doug Cutting
Peter Carlson wrote: > I don't know the actual algorithm, but when you type in the search > > title:hello^3 AND heading:dolly^4 > > Will product different document scores than > > title:hello AND heading:dolly^4 > > Lucene will get the score for a given document, not a field. So it does > comb

Re: Weighted index

2002-06-24 Thread Doug Cutting
Peter Carlson wrote: > I don't know the actual algorithm, but when you type in the search > > title:hello^3 AND heading:dolly^4 > > Will product different document scores than > > title:hello AND heading:dolly^4 > > Lucene will get the score for a given document, not a field. So it does > comb

RE: QueryParser question - case-sensitivity

2002-05-09 Thread Doug Cutting
[I'm resending this from a different account, since my first attempt is bogged down somewhere. A second copy will probably show up tomorrow, but in the interests of solving this problem sooner, I'm resending it. Sorry for the duplicaton.] Define an Analyzer that does not lowercase the id field,

RE: corrupted index

2002-04-02 Thread Doug Cutting
Hinrich, Can you please send a stack trace? As others have mentioned, there isn't an index integrity checker. Doug P.S. Hi! How are you? > -Original Message- > From: H S [mailto:[EMAIL PROTECTED]] > Sent: Monday, April 01, 2002 5:26 PM > To: [EMAIL PROTECTED] > Subject: corrupted in

RE: Relevance Feedback

2002-03-30 Thread Doug Cutting
Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a substantial extension to Lucene which should help folks doing this sort of research. It provides an explicit vector representation for documents. This way you can, e.g., retrieve a number of documents, efficiently sum their vectors, then

RE: IndexWriter thread safety

2002-03-04 Thread Doug Cutting
> From: Paul Dlug [mailto:[EMAIL PROTECTED]] > > Is IndexWriter.addDocument() thread safe? Yes. Doug -- To unsubscribe, e-mail: For additional commands, e-mail:

RE: Optimization and deletes

2002-02-28 Thread Doug Cutting
> From: Aruna Raghavan [mailto:[EMAIL PROTECTED]] > > I have noticed that unless I optimize the indexing while > adding documents to > it, the deleted documents are not getting physically deleted > right away > (even though they seemed to have been flagged as "deleted". > The searcher > could

RE: Wildcard Searching

2002-02-27 Thread Doug Cutting
> From: Howk, Michael [mailto:[EMAIL PROTECTED]] > > Also, Lucene returns the parsed version of each of our > searches. When we > search by rou*d, Lucene parses it as rou*d (which is what we > would expect). > But when we search by rou?d, Lucene parses it as "rou d". It > seems to wrap > the t

RE: Index Locked For Write

2002-02-26 Thread Doug Cutting
> From: Hayes, Mark [mailto:[EMAIL PROTECTED]] > > I understand there are three modes for using IndexReader and > IndexWriter: > > A- IndexReader for reading only, not deleting > B- IndexReader for deleting (and reading) > C- IndexWriter (for adding and optimizing) That sounds right. > Any nu

RE: Boolean Query Parsing with "IN" keyword

2002-02-26 Thread Doug Cutting
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] > > But, StandardAnalyzer is no longer final (get the latest > build) and you > can write a class that subclasses it Right. To flesh out Otis' example of how to change StandardAnalyzer's stop list by defining a subclass of it: public class

RE: Googlifying lucene querys

2002-02-25 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > You cannot, in general, structure a Lucene query such that it > will yield > the same document rankings that Google would for that (query, document > set). The reason for this is that Google employs a scoring > algorithm that > includes

RE: Googlifying lucene querys

2002-02-25 Thread Doug Cutting
If you put the title in a separate field from the contents, and search both fields, matches in the title will usually be stronger, without explicit boosting. This is because the scores are normalized by the length of the field, and the title tends to be much shorter than the contents. So even wi

RE: Phrase problem

2002-02-20 Thread Doug Cutting
> From: David Elworthy [mailto:[EMAIL PROTECTED]] > I'm having a problem search on phrases. If I give the query > books by "Noam Chomsky" about politics > then I get a null pointer exception at the point where I issue the > query. > I'm using lucene 1.2 rc3. > > Any ideas? Upgrade to rc4. Th

RE: Printing queries

2002-02-19 Thread Doug Cutting
The method that is defined is: public void toString(String defaultField); Probably a method like the following should be added: public void toString() { toString(""); } Doug > -Original Message- > From: David Elworthy [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, February 19, 2002 2:3

RE: Searching numerical ranges

2002-02-19 Thread Doug Cutting
> From: David Elworthy [mailto:[EMAIL PROTECTED]] > > I want to be able to search on a field which contains a > numerical value, > specifying a range, such as 1-100. If my understanding of Lucene is > correct, all fields look essentially like strings, so a simple ranhe > query won't work (after

RE: results sorting

2002-02-19 Thread Doug Cutting
> From: Chris Opler [mailto:[EMAIL PROTECTED]] > > Am wondering if there is any facility to sort search hits by > fields in the > Document. No, there's nothing like this built in to Lucene. This can be very expensive with large collections, since it requires reading a Document object for every

RE: Qs re: document scoring and semantics

2002-02-19 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > Is either of the expressions below the correct parenthesization of the > expression above? If not, what is? > > score_d = sum_t(tf_q * (idf_t / norm_q) * tf_d * (idf_t / norm_d_t) * > boost_t) * coord_q_d That's correct. The tf*idf wei

RE: Lucene Query Structure

2002-02-19 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > After considerable study of the documentation, I am still > confused about the semantics of BooleanQuery. > > Now, as sjb pointed out, "(query, false, false)" doesn't > really seem to have the semantics of a boolean OR. In fact, it doe

Lucene release 1.2 RC4

2002-02-14 Thread Doug Cutting
A new release of Lucene is available, 1.2 release candidate 4. The new release can be downloaded from: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc4/ If no serious bugs are identified in the next few days, I'll will make a 1.2 final release. Release notes follow. Doug 1.2

RE: write.lock file

2002-02-14 Thread Doug Cutting
I cannot replicate the problem you are having. Can you please submit a complete, self-contained, test case illustrating the problem you are having with the write lock. Please test this against the latest nightly build of Lucene, from: http://jakarta.apache.org/builds/jakarta-lucene/nightly/ T

RE: using lucene with a very large index

2002-02-14 Thread Doug Cutting
> From: tal blum [mailto:[EMAIL PROTECTED]] > > 2) Does the Document id changes after merging indexes adding > or deleting documents? Yes. > 4) assuming I have a term query that has a large number of > hits say 10 millions, is there a way to get the say the top > 10 results without going thr

RE: PrefixQuery Scoring

2002-02-13 Thread Doug Cutting
> From: Jonathan Franzone [mailto:[EMAIL PROTECTED]] > > Whenever I add a PrefixQuery to my search the scoring gets > really small. For > example if I do a query like this: +java then the scoring > starts around > 0.866... and so forth. But if I do a query like this: +java* then the > scoring s

RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Doug Cutting
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]] > > Problem solved, thanks! Great! > BTW, is the way I'm doing the deletion the correct one? I > reckon I can't use a cached reader, since I have to close it after the > deletion to release the write lock. Does it make sense? Yes. Looks good to

RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Doug Cutting
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]] > > I've just updated my version (via CVS) and now I'm having > problems with document deletion. I'm trying to delete a document using > IndexReader's delete(Term) method and I'm getting an IOException: > > java.io.IOException: Index locked for wr

RE: PhraseQuery: NullPointerException

2002-02-08 Thread Doug Cutting
This bug has been fixed. The fix will be in tonight's nightly build. Doug -- To unsubscribe, e-mail: For additional commands, e-mail:

RE: Indexing and Searching happening together

2002-02-01 Thread Doug Cutting
> From: Kelvin Tan [mailto:[EMAIL PROTECTED]] > > True (and it's great) that once an IndexReader is open, no > actions on the IndexWriter affect it. > > However, if an IndexReader is opened _after_ indexing begins, > I suppose it'll throw an exception? Doesn't it mean that when > indexing is

RE: Obtaining all results efficiently. Closing a searcher.

2002-01-31 Thread Doug Cutting
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Are you implying ( ... public synchronized Searcher > getSearcher()) to > use this synchronized method in a servlet/jsp thread as > well? Yes. > Your jhtml example doesn't appear to > synchronzied. Maybe I'm missing something thou

RE: Obtaining all results efficiently. Closing a searcher.

2002-01-31 Thread Doug Cutting
> From: Ype Kingma [mailto:[EMAIL PROTECTED]] > > Suppose I would like to retrieve all docs that are resulting > from a query. > I should then use the search() call with the HitCollector argument > which is called back with collect(docNr, score) > > Would it be wise to sort by docNr when using

RE: Indexing and Searching happening together

2002-01-31 Thread Doug Cutting
> From: Kelvin Tan [mailto:[EMAIL PROTECTED]] > > In the case where indexing takes a non-trivial amount of > time, what is the expected behaviour when a search is > performed while indexing is still going on? Once an IndexReader is open, no actions on an IndexWriter should affect it. Adding d

RE: Questions on index locking

2002-01-31 Thread Doug Cutting
> From: Matt Tucker [mailto:[EMAIL PROTECTED]] > > I'd like to > suggest that it might help to add some comments to the Javadocs of > IndexReader and IndexWriter about when directories are locked and what > it means. In short, an IndexWriter locks an index so that other IndexWriters cannot be ope

RE: Moving Index from Crawl/Build Server to Search Server

2002-01-31 Thread Doug Cutting
> From: Mark Tucker [mailto:[EMAIL PROTECTED]] > > What is the best way to > move the index from the build server to the search servers > and then change which index a user is searching against? I > am concerned about switching the index while a user is paging > through search results. Ide

RE: strange search problems(cannot query for more than the first 10000 words!?!)

2002-01-28 Thread Doug Cutting
> From: Karl Øie [mailto:[EMAIL PROTECTED]] > > I have created a testclass for working with Analyzers and ran > into a strange > problem; I cannot search for text in fields with more than > 1 words!?!? Lucene by default stops indexing after the 10,000th token. See http://jakarta.apache.org

release 1.2 RC3

2002-01-28 Thread Doug Cutting
A new release of Lucene is available, 1.2 release candidate 3. The new release can be downloaded from: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc3/ If no major problems are identified in the next few days, we will make a 1.2 final release--the first final release since Luc

RE: Term ordering for IndexReader.termDocs()

2002-01-25 Thread Doug Cutting
> From: Ype Kingma [mailto:[EMAIL PROTECTED]] > > I'm creating a filter from a set of terms that are read from > a file, and I find that IndexReader.termDocs(Term(fieldName, > valueFromFile)) > does this quite well (around 0.1 secs elapsed time in jython code.) > > Would it be advantageous to so

RE: Case Sensitivity - and more

2002-01-24 Thread Doug Cutting
> From: Michal Plechawski > > I think that Brian's idea is more flexible and extendable. In my > application, I need three or more kinds of analyzers: for > counting tfidf > statistics, for indexing (compute more, e.g. summaries) and > for document > classification (compute document-to-class ass

RE: Case Sensitivity

2002-01-21 Thread Doug Cutting
Wildcard queries are case sensitive, while other queries depend on the analyzer used for the field searched. The standard analyzer lowercases, so lowercased terms are indexed. Thus your "SPINAL CORD" query is lowercased and matches the indexed terms "spinal" and "cord". However, since prefixes

RE: Parsing of queries.; NEAR queries

2002-01-17 Thread Doug Cutting
> From: Brian Goetz [mailto:[EMAIL PROTECTED]] > > Lots of possibilities exist, but so far they're all pretty yucky. > Suggestions? Here are a few more ideas, none of which I'm in love with. Use a postfix on phrases with tilde: "Mickey Minnie Goofy"~5 Or overloaded parentheses: NEAR5(Mick

<    1   2   3   4   5   >