RE: Lucene 1.2 and directory write permissions?

2001-10-05 Thread Doug Cutting
> From: Snyder, David [mailto:[EMAIL PROTECTED]] > > I've been porting our application to use the 1.2 release > candidate 1 build > and now have a problem opening searchers on our existing > indexes. I get a > Permission Denied exception... our permissions are set up to > allow reading > of

RE: Lucene has moved to Jakarta

2001-10-05 Thread Doug Cutting
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Congratulations on the move! Thanks! > As near as I can see, the two major changes for 1.2-rc1 are: > switch to org.apache.lucene package names. > Apache license instead of LGPL. Yes. Thanks for pointing these out. These are big i

RE: Lucene has moved to Jakarta

2001-10-05 Thread Doug Cutting
> From: William Wong [mailto:[EMAIL PROTECTED]] > > How about adding filters for different file types such as > -HTML (there is one in the demo already) > -XML > -PDF > -MsWord/RTF > -other common file formats These would be great. Who will implement them? I was only listing tasks that I plan t

RE: many analyzers, same index.

2001-10-19 Thread Doug Cutting
> From: Brandon Jockman [mailto:[EMAIL PROTECTED]] > > Is there anything wrong with using multiple analyzers on the > same index, (given of course that I am keeping the set of > documents for each mutually exclusive)? This should work. The primary risk is that they generate different terms th

RE: ordering of results (alphabetical?)

2001-10-19 Thread Doug Cutting
> From: Tom Barrett [mailto:[EMAIL PROTECTED]] > > I am wondering if it is possible to change the ordering of > results from > search. Specifically, we need to sort results alphabetically > by a given > field, rather then by term frequency. I assume there is a way > to do this > with HitCollec

RE: Context specific summary with the search term

2001-10-19 Thread Doug Cutting
> From: Lee Mallabone [mailto:[EMAIL PROTECTED]] > > This is something I also need to implement in the very near future. My > current thoughts are to use a variant of Maik Schreiber's way of doing > term highlighting in documents. See: > http://www.iq-computing.de/lucene/highlight.htm > > Rather

RE: database-based store

2001-10-19 Thread Doug Cutting
> From: Christophe [mailto:[EMAIL PROTECTED]] > > I'm very interrested to store lucene index in a database. > Currently, I use it in EJBs and I'd like to replace io access by jdbc > access. > It would be great that the jdbc solution don't required a > specific database > (ie : Oracle or MSsql).

RE: Request for more search examples

2001-10-19 Thread Doug Cutting
> From: Harmeet [mailto:[EMAIL PROTECTED]] > > Is it possible to add more examples. > > - I would love to see search examples using Query Primitives > like Boolean, > Term etc. The example uses QueryParser. Personally, I would love for someone to re-work the demo entirely, adding more example

new Lucene release: 1.2 RC2

2001-10-19 Thread Doug Cutting
I just posted a second release candidate for Lucene 1.2. This can be found at: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc2/ If no serious bugs are found in the next few days, then I will make this the 1.2 final release. Changes from 1.2 RC1: - added sources to distributi

RE: new Lucene release: 1.2 RC2

2001-10-19 Thread Doug Cutting
> From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 19, 2001 3:28 PM > To: Doug Cutting; '[EMAIL PROTECTED]' > Subject: RE: new Lucene release: 1.2 RC2 > > Well, we know of at least two issues: > 1) RAMDirectory not merging properly (reported by

RE: new Lucene release: 1.2 RC2

2001-10-22 Thread Doug Cutting
> From: Sunil Zanjad [mailto:[EMAIL PROTECTED]] > > >Indexes left in an inconsistent state on crash (i don't > > remember who > > I believe that even I have reported it. This happens on > abrupt exit of the JVM > To do this I had one thread updating a directory containing > many .txt fi

RE: Context specific summary with the search term

2001-10-22 Thread Doug Cutting
> From: Lee Mallabone [mailto:[EMAIL PROTECTED]] > > I'm trying to implement this and should be able to contribute any > succesful results, but I need to produce context on a per-field basis. > Eg. if I got a token hit in the text body of a document, but the first > hit token was a word in the se

RE: new Lucene release: 1.2 RC2

2001-10-22 Thread Doug Cutting
TECTED]] > Sent: Monday, October 22, 2001 12:49 PM > To: Doug Cutting; 'Scott Ganyo'; [EMAIL PROTECTED] > Subject: RE: new Lucene release: 1.2 RC2 > > > > From: Sunil Zanjad [mailto:[EMAIL PROTECTED]] > > > > >Indexes left in an inconsistent state on cra

RE: Context specific summary with the search term

2001-10-23 Thread Doug Cutting
> From: Lee Mallabone [mailto:[EMAIL PROTECTED]] > > > > How did the title ever get indexed as the title? > > I'm indexing HTML documents marked up with comments to indicate field > boundaries. So I'd typically have: > > > blurb > > more blurb > > and so on. The documents were indexed by l

RE: A method for "de-boosting" a term...

2001-10-25 Thread Doug Cutting
Alex, Can you please supply a simple reproducible example? When I set the boost for a term to zero then documents containing it do not come to the top. Nor do they go to the bottom. The boost is multiplied into the weight for the term, but the weights are then added into the document score, so

RE: Segments not merging on delete

2001-10-25 Thread Doug Cutting
I'm not sure if this is the cause of your problems, but when you're doing deletions you need to close the reader before you open a writer, otherwise deletions can be lost. You're claiming that additions are lost, but could it really be that it is the deletions which have been lost? Try closing t

RE: Querying an exact string match ?

2001-10-31 Thread Doug Cutting
This should work. You should be able to find an un-tokenized field containing spaces with a TermQuery. Nothing should ever tokenize the string. Can you please supply a simple, self-contained example showing that this does not work? Thanks, Doug > -Original Message- > From: Winton Dav

RE: Problems with prohibited BooleanQueries

2001-10-31 Thread Doug Cutting
Lucene does not implement a standalone "NOT" query. (Probably BooleanQuery should throw an exception if all clauses are prohibited clauses.) Negation is only implemented with respect to other non-negated clauses. So you cannot directly model your query tree as a Lucene query tree. NOT nodes mu

RE: Problems with prohibited BooleanQueries

2001-11-01 Thread Doug Cutting
> From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > > How difficult would it be to get BooleanQuery to do a > standalone NOT, do you > suppose? That would be very useful in my case. It would not be that difficult, but it would make queries slow. All terms not containing a term would need to be e

RE: Do range queries work?

2001-11-01 Thread Doug Cutting
Can folks please try to include complete, self-contained test cases when submitting bugs? It's not that hard, and makes it much easier to figure out what is going on. For example, I have attached a complete, self-contained test case for the bug reported below. It only took 50 lines. Interestin

RE: Do range queries work?

2001-11-01 Thread Doug Cutting
> From: Paul Friedman [mailto:[EMAIL PROTECTED]] > > It looks like there is a bug (besides the StandardAnalyzer > parsing 20-35 as a single term). The query in your example: > > search(searcher, analyzer, "FirstName:[a-k]"); > > is not finding the correct document. It is finding doc2, i

test code

2001-11-01 Thread Doug Cutting
> From: Brian Goetz [mailto:[EMAIL PROTECTED]] > > I'd like to see the existing test programs converted into > JUnit test cases > -- I'm willing to do this if someone will tell me how they > work and what > they're supposed to output and how to invoke them. These are mostly things that I wro

RE: Indexing problem

2001-11-02 Thread Doug Cutting
> From: Daryl Thachuk [mailto:[EMAIL PROTECTED]] > > A question I'd like answered is, why do I now have to be > concerned about > having too many files open when before I didn't? What has changed to > cause this? This sounds like a bug to me. Sigh. IndexReader now keeps all files that are no

RE: compile lucene-1.2-rc2

2001-11-02 Thread Doug Cutting
This looks like a good start for a top-level README.txt, which we need before the 1.2 final. Besides build instructions, this should include pointers to the documentation. Anything else folks can think of? One correction: We should not mention downloading JavaCC, just ant and ant-optional.jar.

RE: Maximum file size problem

2001-11-03 Thread Doug Cutting
Otis has already answered most of this. > From: Winton Davies [mailto:[EMAIL PROTECTED]] > > *** Anyway, is there anyway to control how big the indexes > grow ? The easiset thing is to set IndexWriter.maxMergeDocs. Since you hit 2GB at 8M docs, set this to 7M. That will keep Lucene fro

RE: Memory Usage?

2001-11-12 Thread Doug Cutting
> > org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosRead > er.java:166) > > > I've attached the whole trace as gzipped.txt > > regards, > Anders Nielsen > > -Original Message- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent:

RE: Memory Usage?

2001-11-12 Thread Doug Cutting
> From: Anders Nielsen [mailto:[EMAIL PROTECTED]] > > this was a big boolean query, with several prefixqueries but > no wildcard > queries in the or-branches. Well it looks like those prefixes are expanding to a lot of terms, a total of over 40,000! (A prefix query expands into a BooleanQuery

RE: Memory Usage?

2001-11-12 Thread Doug Cutting
> From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > > I think something like this would be a HUGE boon for us. We > do a lot of > complex queries on a lot of different indexes and end up > suffering from > severe garbage collection issues on our system. I'd be > willing to help out > in any way

RE: Memory Usage?

2001-11-12 Thread Doug Cutting
> From: Anders Nielsen [mailto:[EMAIL PROTECTED]] > > hmm, I seem to be getting a different number of hits when I > use the files > you sent out. Please provide more information! Is it larger or smaller than before? By how much? What differences show up in the hits? That's a terrible bug re

RE: RAMdirectory from Directory ?

2001-11-15 Thread Doug Cutting
> From: Winton Davies [mailto:[EMAIL PROTECTED]] > > It loaded into memory, but then when I tried a search I got > this stack trace: > > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.lucene.store.RAMInputStream.readI

RE: Efficient doc information retrieval.

2001-11-15 Thread Doug Cutting
> From: Winton Davies [mailto:[EMAIL PROTECTED]] > > Not really, all documents have an accountID, but I need to search > all the documents > first, and each document that is returned has an accountID, but I > just want one document > per accountID. > > so: > > doc1 acc1 > doc2 acc1 >

RE: extracting information from an index

2001-11-16 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > I am trying to construct a term-term correlation matrix from the data > stored in the index, for an extension to the vector model that I am > researching. In case my terminology is unfamiliar, what I > need in order > to do this is, for

RE: extracting information from an index

2001-11-16 Thread Doug Cutting
Please don't email me directly. Keep Lucene questions on Lucene lists. > From: Winton Davies [mailto:[EMAIL PROTECTED]] > > Doug, I had the same question regarding enumerating the documents for > accountIDs. > > I dont think I have any field that I can quickly guarantee that a > term will wor

RE: Sorting Options for Query Results

2001-11-16 Thread Doug Cutting
This is not easy to do efficiently. The efficiency of the search code depends on not constructing Document objects for every match. Thus it is hard to efficiently perform calculations which require field values. Things are easy if you need date order, and you have added documents in date order.

RE: Sorting Options for Query Results

2001-11-19 Thread Doug Cutting
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > I think this still works if the the document number continue > to increase > by one when documents are added incrementally. > Does anyone know if this is true (I haven't looked at the code yet). Yes, that is true, so long as you do not d

RE: Sorting Options for Query Results

2001-11-19 Thread Doug Cutting
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > What happens to the document numbers when documents are > deleted and the > segment merged? > Is the document number of all the documents reset to be > sequential based > on some offset for each segment? > > Is there any pattern that m

RE: Attribute Search

2001-11-26 Thread Doug Cutting
> From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]] > > this is exactly what I was doing. Store=false, index=true, > and token=false. > > It appeared to work ok, but searches *never* returned any hits. > > That's why I suspect it is a bug. If you think this is a bug, please submit a test ca

RE: synchronization problem / bug?

2001-11-27 Thread Doug Cutting
Lucene assumes that if an open file is deleted then it will continue to be readable at least until it is closed. So if your SQL Directory is permitting open files to be deleted then you will encounter problems. All open files should be somehow locked against deletion. If a file deletion fails L

RE: IndexReader and IndexWriter on the same index

2001-11-27 Thread Doug Cutting
If you are performing additions and deletions then you should serially create an IndexReader to do deletions, close it, then create an IndexWriter to do additions, close it, and so on. Note that typically one will use a different IndexReader for deletions than is used for searching, so that searc

RE: IndexReader and IndexWriter on the same index

2001-11-27 Thread Doug Cutting
> From: Avi Drissman [mailto:[EMAIL PROTECTED]] > > But if I need to do a deletion before every addition, then there's > the overhead of all those reader and writer creations. There's no way > around it? Batch your deletions, then batch your additions. If you need these changes to appear atom

RE: Parallelising a query...

2001-11-29 Thread Doug Cutting
> From: Winton Davies [mailto:[EMAIL PROTECTED]] > >I have 4 million documents... I could: > >Split these into 4 x 1 million document indexes and then send a > query to 4 Lucene processes ? At the end I would have to sort the > results by relevance. > >Question for Doug or any o

RE: Transactional Indexing

2001-11-29 Thread Doug Cutting
> From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]] > > I have noticed that when I kill/interrupt an indexing process, that it > leaves a "lock" file, preventing further indexing. > > This raises a couple of questions: > a. When I simply delete the file and restart the indexing, it > seems to

RE: Attribute Search Bug

2001-11-29 Thread Doug Cutting
ace(); > } > > // now make a query and search for document > Searcher searcher = new IndexSearcher(indexPath); > Query query = QueryParser.parse(fieldValue, fieldName, > new SimpleAnalyzer()); > Hits hits = searcher.search(query); >

RE: Parallelising a query...

2001-11-29 Thread Doug Cutting
TermDocs are ordered by document number. It would not be easy to change this. Doug > -Original Message- > From: Winton Davies [mailto:[EMAIL PROTECTED]] > Sent: Thursday, November 29, 2001 11:12 AM > To: Lucene Users List > Subject: Re: Parallelising a query... > > > Hi again > >

RE: Does Lucene really work with Java 1.1.8

2001-10-09 Thread Doug Cutting
> From: Brook, James [mailto:[EMAIL PROTECTED]] > > I am trying to use the 'lucene-1.2-rc1.jar' with a WebObjects 4.5 > application, but having problems. WebObjects uses Java 1.1.8. > I read on the > jGuru Lucene FAQ that Lucene should work with this version of > Java. Is this > correct? It sh

RE: File Handles issue

2001-10-11 Thread Doug Cutting
> From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > > We're having a heck of a time with too many file handles > around here. When > we create large indexes, we often get thousands of temporary > files in a given index! Thousands, eh? That seems high. The maximum number of segments should be f

RE: Index Optimization: Which is Better?

2001-10-11 Thread Doug Cutting
Elliot, I'm having trouble getting a clear picture of your indexing scheme. Could you provide some simple examples, e.g., for the xml: this is some text and some other text would you have something like the following? doc1 node_type: tag1 contents: this is some text doc2

RE: File Handles issue

2001-10-15 Thread Doug Cutting
> From: Scott Ganyo [mailto:[EMAIL PROTECTED]] > > Thanks for the detailed information, Doug! That helps a lot. > > Based on what you've said and on taking a closer look at the > code, it looks > like by setting mergeFactor and maxMergeDocs to > Integer.MAX_VALUE, an entire > index will be bu

RE: number of terms vs. number of fields

2001-12-03 Thread Doug Cutting
Lucene counts the same string in different fields as a different term. In other words, a term is composed of a field and a string. Doug > -Original Message- > From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > Sent: Saturday, December 01, 2001 6:55 PM > To: [EMAIL PROTECTED] > Subjec

RE: prefix query with multiple words

2001-12-04 Thread Doug Cutting
In short, this is not currently supported, but might be someday. For more details, see my recent response to a message with subject "RE: Near without slop". Doug > -Original Message- > From: Tom Barrett [mailto:[EMAIL PROTECTED]] > Sent: Monday, December 03, 2001 3:42 PM > To: [EMAIL PR

RE: Near without slop

2001-12-04 Thread Doug Cutting
> From: Paddy Clark [mailto:[EMAIL PROTECTED]] > > > >My current "NEAR" solution is to modify the query parser to build a > >PhraseQuery from the terms surrounding NEAR and set the slop > >correctly. This works for this kind of query: > > > >Bob NEAR Jim > > > >The problem comes when I try > >

RE: existing or not existing

2001-12-05 Thread Doug Cutting
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] > > You could try looking for a segments file in the index directory. > If it exists, the index exists, else it does not. > > Is there a better way? I think that's currently the best way. But it's not great, because it requires applications t

RE: Indexing other documents type than html and txt (XML)

2001-11-30 Thread Doug Cutting
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]] > > How does lucene-dev feel about creating a 'contrib' area in > CVS for these > kinds of things that folks really need to make Lucene come to > life for them, > but are obviously not part of the main engine? I think this is a fine idea, but it

RE: suggestion

2001-12-07 Thread Doug Cutting
Can you provide some examples of tags that you think would be useful? Would you like to implement these? Doug > -Original Message- > From: YMikulski [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 07, 2001 7:50 AM > To: users LUCENE > Subject: suggestion > > > Hello! > I like to s

RE: concurrent usage summary

2001-12-07 Thread Doug Cutting
> From: [EMAIL PROTECTED] > > It looks like Lucene supports concurrent searchs as long as > the index is not > modified with add, delete, or optimize actions (and maybe > others?). Searches may be made at any time, including while the index is being modified by the add, delete and optimize ope

RE: Industry Use of Lucene?

2001-12-07 Thread Doug Cutting
Kelvin, I don't seen "powered by Lucene" on your results pages: http://www.relevanz.com/Search?query=media If you add this, we can add you to the "Powered by Lucene" page: http://jakarta.apache.org/lucene/docs/powered.html What other sites should be added to this page? Doug > -Origin

RE: Include ANT in CVS?

2001-12-10 Thread Doug Cutting
> From: Paul Spencer [mailto:[EMAIL PROTECTED]] > > I suggest you include ant in the CVS. I believe this is a common > practice for Jakarta projects. Just because it is common practice does not mean that it is a good idea. Why do you think Ant should be included in Lucene's CVS? What would i

RE: Using a DateFilter without a query

2001-12-17 Thread Doug Cutting
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Useful alternative to DateFilter is using RangeQuery, > containing string > representations of date objects (DateField.dateToString()) > for lower and upper term. Probably we should create a subclass of RangeQuery called DateQuery that ma

RE: Parsing of queries.; NEAR queries

2002-01-17 Thread Doug Cutting
> From: Brian Goetz [mailto:[EMAIL PROTECTED]] > > Lots of possibilities exist, but so far they're all pretty yucky. > Suggestions? Here are a few more ideas, none of which I'm in love with. Use a postfix on phrases with tilde: "Mickey Minnie Goofy"~5 Or overloaded parentheses: NEAR5(Mick

RE: Case Sensitivity

2002-01-21 Thread Doug Cutting
Wildcard queries are case sensitive, while other queries depend on the analyzer used for the field searched. The standard analyzer lowercases, so lowercased terms are indexed. Thus your "SPINAL CORD" query is lowercased and matches the indexed terms "spinal" and "cord". However, since prefixes

RE: Case Sensitivity - and more

2002-01-24 Thread Doug Cutting
> From: Michal Plechawski > > I think that Brian's idea is more flexible and extendable. In my > application, I need three or more kinds of analyzers: for > counting tfidf > statistics, for indexing (compute more, e.g. summaries) and > for document > classification (compute document-to-class ass

RE: Term ordering for IndexReader.termDocs()

2002-01-25 Thread Doug Cutting
> From: Ype Kingma [mailto:[EMAIL PROTECTED]] > > I'm creating a filter from a set of terms that are read from > a file, and I find that IndexReader.termDocs(Term(fieldName, > valueFromFile)) > does this quite well (around 0.1 secs elapsed time in jython code.) > > Would it be advantageous to so

RE: strange search problems(cannot query for more than the first 10000 words!?!)

2002-01-28 Thread Doug Cutting
> From: Karl Øie [mailto:[EMAIL PROTECTED]] > > I have created a testclass for working with Analyzers and ran > into a strange > problem; I cannot search for text in fields with more than > 1 words!?!? Lucene by default stops indexing after the 10,000th token. See http://jakarta.apache.org

release 1.2 RC3

2002-01-28 Thread Doug Cutting
A new release of Lucene is available, 1.2 release candidate 3. The new release can be downloaded from: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc3/ If no major problems are identified in the next few days, we will make a 1.2 final release--the first final release since Luc

RE: Moving Index from Crawl/Build Server to Search Server

2002-01-31 Thread Doug Cutting
> From: Mark Tucker [mailto:[EMAIL PROTECTED]] > > What is the best way to > move the index from the build server to the search servers > and then change which index a user is searching against? I > am concerned about switching the index while a user is paging > through search results. Ide

RE: Questions on index locking

2002-01-31 Thread Doug Cutting
> From: Matt Tucker [mailto:[EMAIL PROTECTED]] > > I'd like to > suggest that it might help to add some comments to the Javadocs of > IndexReader and IndexWriter about when directories are locked and what > it means. In short, an IndexWriter locks an index so that other IndexWriters cannot be ope

RE: Indexing and Searching happening together

2002-01-31 Thread Doug Cutting
> From: Kelvin Tan [mailto:[EMAIL PROTECTED]] > > In the case where indexing takes a non-trivial amount of > time, what is the expected behaviour when a search is > performed while indexing is still going on? Once an IndexReader is open, no actions on an IndexWriter should affect it. Adding d

RE: Obtaining all results efficiently. Closing a searcher.

2002-01-31 Thread Doug Cutting
> From: Ype Kingma [mailto:[EMAIL PROTECTED]] > > Suppose I would like to retrieve all docs that are resulting > from a query. > I should then use the search() call with the HitCollector argument > which is called back with collect(docNr, score) > > Would it be wise to sort by docNr when using

RE: Obtaining all results efficiently. Closing a searcher.

2002-01-31 Thread Doug Cutting
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Are you implying ( ... public synchronized Searcher > getSearcher()) to > use this synchronized method in a servlet/jsp thread as > well? Yes. > Your jhtml example doesn't appear to > synchronzied. Maybe I'm missing something thou

RE: Indexing and Searching happening together

2002-02-01 Thread Doug Cutting
> From: Kelvin Tan [mailto:[EMAIL PROTECTED]] > > True (and it's great) that once an IndexReader is open, no > actions on the IndexWriter affect it. > > However, if an IndexReader is opened _after_ indexing begins, > I suppose it'll throw an exception? Doesn't it mean that when > indexing is

RE: PhraseQuery: NullPointerException

2002-02-08 Thread Doug Cutting
This bug has been fixed. The fix will be in tonight's nightly build. Doug -- To unsubscribe, e-mail: For additional commands, e-mail:

RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Doug Cutting
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]] > > I've just updated my version (via CVS) and now I'm having > problems with document deletion. I'm trying to delete a document using > IndexReader's delete(Term) method and I'm getting an IOException: > > java.io.IOException: Index locked for wr

RE: problems with last patch (obtain write.lock while deleting documents)

2002-02-10 Thread Doug Cutting
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]] > > Problem solved, thanks! Great! > BTW, is the way I'm doing the deletion the correct one? I > reckon I can't use a cached reader, since I have to close it after the > deletion to release the write lock. Does it make sense? Yes. Looks good to

RE: PrefixQuery Scoring

2002-02-13 Thread Doug Cutting
> From: Jonathan Franzone [mailto:[EMAIL PROTECTED]] > > Whenever I add a PrefixQuery to my search the scoring gets > really small. For > example if I do a query like this: +java then the scoring > starts around > 0.866... and so forth. But if I do a query like this: +java* then the > scoring s

RE: using lucene with a very large index

2002-02-14 Thread Doug Cutting
> From: tal blum [mailto:[EMAIL PROTECTED]] > > 2) Does the Document id changes after merging indexes adding > or deleting documents? Yes. > 4) assuming I have a term query that has a large number of > hits say 10 millions, is there a way to get the say the top > 10 results without going thr

RE: write.lock file

2002-02-14 Thread Doug Cutting
I cannot replicate the problem you are having. Can you please submit a complete, self-contained, test case illustrating the problem you are having with the write lock. Please test this against the latest nightly build of Lucene, from: http://jakarta.apache.org/builds/jakarta-lucene/nightly/ T

Lucene release 1.2 RC4

2002-02-14 Thread Doug Cutting
A new release of Lucene is available, 1.2 release candidate 4. The new release can be downloaded from: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc4/ If no serious bugs are identified in the next few days, I'll will make a 1.2 final release. Release notes follow. Doug 1.2

RE: results sorting

2002-02-19 Thread Doug Cutting
> From: Chris Opler [mailto:[EMAIL PROTECTED]] > > Am wondering if there is any facility to sort search hits by > fields in the > Document. No, there's nothing like this built in to Lucene. This can be very expensive with large collections, since it requires reading a Document object for every

RE: Lucene Query Structure

2002-02-19 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > After considerable study of the documentation, I am still > confused about the semantics of BooleanQuery. > > Now, as sjb pointed out, "(query, false, false)" doesn't > really seem to have the semantics of a boolean OR. In fact, it doe

RE: Qs re: document scoring and semantics

2002-02-19 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > Is either of the expressions below the correct parenthesization of the > expression above? If not, what is? > > score_d = sum_t(tf_q * (idf_t / norm_q) * tf_d * (idf_t / norm_d_t) * > boost_t) * coord_q_d That's correct. The tf*idf wei

RE: Searching numerical ranges

2002-02-19 Thread Doug Cutting
> From: David Elworthy [mailto:[EMAIL PROTECTED]] > > I want to be able to search on a field which contains a > numerical value, > specifying a range, such as 1-100. If my understanding of Lucene is > correct, all fields look essentially like strings, so a simple ranhe > query won't work (after

RE: Printing queries

2002-02-19 Thread Doug Cutting
The method that is defined is: public void toString(String defaultField); Probably a method like the following should be added: public void toString() { toString(""); } Doug > -Original Message- > From: David Elworthy [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, February 19, 2002 2:3

RE: Phrase problem

2002-02-20 Thread Doug Cutting
> From: David Elworthy [mailto:[EMAIL PROTECTED]] > I'm having a problem search on phrases. If I give the query > books by "Noam Chomsky" about politics > then I get a null pointer exception at the point where I issue the > query. > I'm using lucene 1.2 rc3. > > Any ideas? Upgrade to rc4. Th

RE: Googlifying lucene querys

2002-02-25 Thread Doug Cutting
If you put the title in a separate field from the contents, and search both fields, matches in the title will usually be stronger, without explicit boosting. This is because the scores are normalized by the length of the field, and the title tends to be much shorter than the contents. So even wi

RE: Googlifying lucene querys

2002-02-25 Thread Doug Cutting
> From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > > You cannot, in general, structure a Lucene query such that it > will yield > the same document rankings that Google would for that (query, document > set). The reason for this is that Google employs a scoring > algorithm that > includes

RE: Boolean Query Parsing with "IN" keyword

2002-02-26 Thread Doug Cutting
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] > > But, StandardAnalyzer is no longer final (get the latest > build) and you > can write a class that subclasses it Right. To flesh out Otis' example of how to change StandardAnalyzer's stop list by defining a subclass of it: public class

RE: Index Locked For Write

2002-02-26 Thread Doug Cutting
> From: Hayes, Mark [mailto:[EMAIL PROTECTED]] > > I understand there are three modes for using IndexReader and > IndexWriter: > > A- IndexReader for reading only, not deleting > B- IndexReader for deleting (and reading) > C- IndexWriter (for adding and optimizing) That sounds right. > Any nu

RE: Wildcard Searching

2002-02-27 Thread Doug Cutting
> From: Howk, Michael [mailto:[EMAIL PROTECTED]] > > Also, Lucene returns the parsed version of each of our > searches. When we > search by rou*d, Lucene parses it as rou*d (which is what we > would expect). > But when we search by rou?d, Lucene parses it as "rou d". It > seems to wrap > the t

RE: Optimization and deletes

2002-02-28 Thread Doug Cutting
> From: Aruna Raghavan [mailto:[EMAIL PROTECTED]] > > I have noticed that unless I optimize the indexing while > adding documents to > it, the deleted documents are not getting physically deleted > right away > (even though they seemed to have been flagged as "deleted". > The searcher > could

RE: IndexWriter thread safety

2002-03-04 Thread Doug Cutting
> From: Paul Dlug [mailto:[EMAIL PROTECTED]] > > Is IndexWriter.addDocument() thread safe? Yes. Doug -- To unsubscribe, e-mail: For additional commands, e-mail:

RE: Relevance Feedback

2002-03-30 Thread Doug Cutting
Dmitry Serebrennikov [[EMAIL PROTECTED]] has implemented a substantial extension to Lucene which should help folks doing this sort of research. It provides an explicit vector representation for documents. This way you can, e.g., retrieve a number of documents, efficiently sum their vectors, then

RE: corrupted index

2002-04-02 Thread Doug Cutting
Hinrich, Can you please send a stack trace? As others have mentioned, there isn't an index integrity checker. Doug P.S. Hi! How are you? > -Original Message- > From: H S [mailto:[EMAIL PROTECTED]] > Sent: Monday, April 01, 2002 5:26 PM > To: [EMAIL PROTECTED] > Subject: corrupted in

RE: QueryParser question - case-sensitivity

2002-05-09 Thread Doug Cutting
[I'm resending this from a different account, since my first attempt is bogged down somewhere. A second copy will probably show up tomorrow, but in the interests of solving this problem sooner, I'm resending it. Sorry for the duplicaton.] Define an Analyzer that does not lowercase the id field,

Re: Weighted index

2002-06-24 Thread Doug Cutting
Peter Carlson wrote: > I don't know the actual algorithm, but when you type in the search > > title:hello^3 AND heading:dolly^4 > > Will product different document scores than > > title:hello AND heading:dolly^4 > > Lucene will get the score for a given document, not a field. So it does > comb

Re: Weighted index

2002-06-24 Thread Doug Cutting
Peter Carlson wrote: > I don't know the actual algorithm, but when you type in the search > > title:hello^3 AND heading:dolly^4 > > Will product different document scores than > > title:hello AND heading:dolly^4 > > Lucene will get the score for a given document, not a field. So it does > comb

Re: Stress Testing Lucene

2002-06-27 Thread Doug Cutting
It's very hard to leave an index in a bad state. Updating the "segments" file atomically updates the index. So the only way to corrupt things is to only partly update the segments file. But that too is hard, since it's first written to a temporary file, which is then renamed "segments". Th

Re: Crash / Recovery Scenario

2002-07-10 Thread Doug Cutting
Karl Øie wrote: > If a crash happends during writing happens there is no good way to know if the > index is intact, removing lock files doesn't help this fact, as we really > don't know. So providing rollback functionality is a good but expensive way > of compensating for lack of recovery. The

Re: Crash / Recovery Scenario

2002-07-10 Thread Doug Cutting
Karl Øie wrote: > A better solution would be to hack the FSDirectory to store each file it would > store in a file-directory as a serialized byte array in a blob of a sql > table. This would increase performance because the whole Directory don't have > to change each time, and it doesn't have

Re: CachedSearcher

2002-07-15 Thread Doug Cutting
Halácsy Péter wrote: > A lot of people requested a code to cache opened Searcher objects until the index is >not modified. The first version of this was writed by Scott Ganyo and submitted as >IndexAccessControl to the list. > > Now I've decoupled the logic that is needed to manage searher. >

Re: CachedSearcher

2002-07-16 Thread Doug Cutting
Kelvin Tan wrote: > If the object has a close() method with public modifier, isn't it a common > idiom that client code needs to invoke close() explicitly? If there's no > real need to call close, maybe it can be changed to protected? Yes, that is a common idiom. In the case of Lucene's FSDire

  1   2   3   4   5   >