RE: Wildcard Searching

2002-02-27 Thread Doug Cutting
From: Howk, Michael [mailto:[EMAIL PROTECTED]] Also, Lucene returns the parsed version of each of our searches. When we search by rou*d, Lucene parses it as rou*d (which is what we would expect). But when we search by rou?d, Lucene parses it as rou d. It seems to wrap the term in

RE: Boolean Query Parsing with IN keyword

2002-02-26 Thread Doug Cutting
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] But, StandardAnalyzer is no longer final (get the latest build) and you can write a class that subclasses it Right. To flesh out Otis' example of how to change StandardAnalyzer's stop list by defining a subclass of it: public class

RE: Googlifying lucene querys

2002-02-25 Thread Doug Cutting
If you put the title in a separate field from the contents, and search both fields, matches in the title will usually be stronger, without explicit boosting. This is because the scores are normalized by the length of the field, and the title tends to be much shorter than the contents. So even

RE: Googlifying lucene querys

2002-02-25 Thread Doug Cutting
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] You cannot, in general, structure a Lucene query such that it will yield the same document rankings that Google would for that (query, document set). The reason for this is that Google employs a scoring algorithm that includes

RE: Lucene Query Structure

2002-02-19 Thread Doug Cutting
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] After considerable study of the documentation, I am still confused about the semantics of BooleanQuery. Now, as sjb pointed out, (query, false, false) doesn't really seem to have the semantics of a boolean OR. In fact, it does. In

RE: Qs re: document scoring and semantics

2002-02-19 Thread Doug Cutting
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] Is either of the expressions below the correct parenthesization of the expression above? If not, what is? score_d = sum_t(tf_q * (idf_t / norm_q) * tf_d * (idf_t / norm_d_t) * boost_t) * coord_q_d That's correct. The tf*idf weights

RE: using lucene with a very large index

2002-02-14 Thread Doug Cutting
From: tal blum [mailto:[EMAIL PROTECTED]] 2) Does the Document id changes after merging indexes adding or deleting documents? Yes. 4) assuming I have a term query that has a large number of hits say 10 millions, is there a way to get the say the top 10 results without going through

RE: write.lock file

2002-02-14 Thread Doug Cutting
I cannot replicate the problem you are having. Can you please submit a complete, self-contained, test case illustrating the problem you are having with the write lock. Please test this against the latest nightly build of Lucene, from: http://jakarta.apache.org/builds/jakarta-lucene/nightly/

RE: PrefixQuery Scoring

2002-02-13 Thread Doug Cutting
From: Jonathan Franzone [mailto:[EMAIL PROTECTED]] Whenever I add a PrefixQuery to my search the scoring gets really small. For example if I do a query like this: +java then the scoring starts around 0.866... and so forth. But if I do a query like this: +java* then the scoring start

RE: Indexing and Searching happening together

2002-02-01 Thread Doug Cutting
From: Kelvin Tan [mailto:[EMAIL PROTECTED]] True (and it's great) that once an IndexReader is open, no actions on the IndexWriter affect it. However, if an IndexReader is opened _after_ indexing begins, I suppose it'll throw an exception? Doesn't it mean that when indexing is taking

RE: Moving Index from Crawl/Build Server to Search Server

2002-01-31 Thread Doug Cutting
From: Mark Tucker [mailto:[EMAIL PROTECTED]] What is the best way to move the index from the build server to the search servers and then change which index a user is searching against? I am concerned about switching the index while a user is paging through search results. Ideally

RE: Obtaining all results efficiently. Closing a searcher.

2002-01-31 Thread Doug Cutting
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Are you implying ( ... public synchronized Searcher getSearcher()) to use this synchronized method in a servlet/jsp thread as well? Yes. Your jhtml example doesn't appear to synchronzied. Maybe I'm missing something though.

RE: strange search problems(cannot query for more than the first 10000 words!?!)

2002-01-28 Thread Doug Cutting
From: Karl Øie [mailto:[EMAIL PROTECTED]] I have created a testclass for working with Analyzers and ran into a strange problem; I cannot search for text in fields with more than 1 words!?!? Lucene by default stops indexing after the 10,000th token. See

release 1.2 RC3

2002-01-28 Thread Doug Cutting
A new release of Lucene is available, 1.2 release candidate 3. The new release can be downloaded from: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc3/ If no major problems are identified in the next few days, we will make a 1.2 final release--the first final release since

RE: Term ordering for IndexReader.termDocs()

2002-01-25 Thread Doug Cutting
From: Ype Kingma [mailto:[EMAIL PROTECTED]] I'm creating a filter from a set of terms that are read from a file, and I find that IndexReader.termDocs(Term(fieldName, valueFromFile)) does this quite well (around 0.1 secs elapsed time in jython code.) Would it be advantageous to sort the

RE: Industry Use of Lucene?

2001-12-07 Thread Doug Cutting
Kelvin, I don't seen powered by Lucene on your results pages: http://www.relevanz.com/Search?query=media If you add this, we can add you to the Powered by Lucene page: http://jakarta.apache.org/lucene/docs/powered.html What other sites should be added to this page? Doug -Original

RE: prefix query with multiple words

2001-12-04 Thread Doug Cutting
In short, this is not currently supported, but might be someday. For more details, see my recent response to a message with subject RE: Near without slop. Doug -Original Message- From: Tom Barrett [mailto:[EMAIL PROTECTED]] Sent: Monday, December 03, 2001 3:42 PM To: [EMAIL

RE: Near without slop

2001-12-04 Thread Doug Cutting
From: Paddy Clark [mailto:[EMAIL PROTECTED]] My current NEAR solution is to modify the query parser to build a PhraseQuery from the terms surrounding NEAR and set the slop correctly. This works for this kind of query: Bob NEAR Jim The problem comes when I try microsoft NEAR app*

RE: number of terms vs. number of fields

2001-12-03 Thread Doug Cutting
Lucene counts the same string in different fields as a different term. In other words, a term is composed of a field and a string. Doug -Original Message- From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] Sent: Saturday, December 01, 2001 6:55 PM To: [EMAIL PROTECTED] Subject:

RE: Parallelising a query...

2001-11-29 Thread Doug Cutting
From: Winton Davies [mailto:[EMAIL PROTECTED]] I have 4 million documents... I could: Split these into 4 x 1 million document indexes and then send a query to 4 Lucene processes ? At the end I would have to sort the results by relevance. Question for Doug or any other

RE: Transactional Indexing

2001-11-29 Thread Doug Cutting
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]] I have noticed that when I kill/interrupt an indexing process, that it leaves a lock file, preventing further indexing. This raises a couple of questions: a. When I simply delete the file and restart the indexing, it seems to work. Is

RE: Parallelising a query...

2001-11-29 Thread Doug Cutting
TermDocs are ordered by document number. It would not be easy to change this. Doug -Original Message- From: Winton Davies [mailto:[EMAIL PROTECTED]] Sent: Thursday, November 29, 2001 11:12 AM To: Lucene Users List Subject: Re: Parallelising a query... Hi again

RE: IndexReader and IndexWriter on the same index

2001-11-27 Thread Doug Cutting
If you are performing additions and deletions then you should serially create an IndexReader to do deletions, close it, then create an IndexWriter to do additions, close it, and so on. Note that typically one will use a different IndexReader for deletions than is used for searching, so that

RE: Attribute Search

2001-11-26 Thread Doug Cutting
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]] this is exactly what I was doing. Store=false, index=true, and token=false. It appeared to work ok, but searches *never* returned any hits. That's why I suspect it is a bug. If you think this is a bug, please submit a test case, as

RE: Sorting Options for Query Results

2001-11-19 Thread Doug Cutting
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] I think this still works if the the document number continue to increase by one when documents are added incrementally. Does anyone know if this is true (I haven't looked at the code yet). Yes, that is true, so long as you do not delete

RE: Memory Usage?

2001-11-12 Thread Doug Cutting
) org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:114) org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosRead er.java:166) I've attached the whole trace as gzipped.txt regards, Anders Nielsen -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: 10. november 2001 04:35

RE: Memory Usage?

2001-11-12 Thread Doug Cutting
From: Anders Nielsen [mailto:[EMAIL PROTECTED]] hmm, I seem to be getting a different number of hits when I use the files you sent out. Please provide more information! Is it larger or smaller than before? By how much? What differences show up in the hits? That's a terrible bug

RE: Problems with prohibited BooleanQueries

2001-11-01 Thread Doug Cutting
From: Scott Ganyo [mailto:[EMAIL PROTECTED]] How difficult would it be to get BooleanQuery to do a standalone NOT, do you suppose? That would be very useful in my case. It would not be that difficult, but it would make queries slow. All terms not containing a term would need to be

RE: Do range queries work?

2001-11-01 Thread Doug Cutting
From: Paul Friedman [mailto:[EMAIL PROTECTED]] It looks like there is a bug (besides the StandardAnalyzer parsing 20-35 as a single term). The query in your example: search(searcher, analyzer, FirstName:[a-k]); is not finding the correct document. It is finding doc2, it

RE: Querying an exact string match ?

2001-10-31 Thread Doug Cutting
This should work. You should be able to find an un-tokenized field containing spaces with a TermQuery. Nothing should ever tokenize the string. Can you please supply a simple, self-contained example showing that this does not work? Thanks, Doug -Original Message- From: Winton

RE: new Lucene release: 1.2 RC2

2001-10-22 Thread Doug Cutting
From: Sunil Zanjad [mailto:[EMAIL PROTECTED]] Indexes left in an inconsistent state on crash (i don't remember who I believe that even I have reported it. This happens on abrupt exit of the JVM To do this I had one thread updating a directory containing many .txt files and

RE: Context specific summary with the search term

2001-10-22 Thread Doug Cutting
From: Lee Mallabone [mailto:[EMAIL PROTECTED]] I'm trying to implement this and should be able to contribute any succesful results, but I need to produce context on a per-field basis. Eg. if I got a token hit in the text body of a document, but the first hit token was a word in the section

RE: File Handles issue

2001-10-15 Thread Doug Cutting
From: Scott Ganyo [mailto:[EMAIL PROTECTED]] Thanks for the detailed information, Doug! That helps a lot. Based on what you've said and on taking a closer look at the code, it looks like by setting mergeFactor and maxMergeDocs to Integer.MAX_VALUE, an entire index will be built in a

RE: File Handles issue

2001-10-11 Thread Doug Cutting
From: Scott Ganyo [mailto:[EMAIL PROTECTED]] We're having a heck of a time with too many file handles around here. When we create large indexes, we often get thousands of temporary files in a given index! Thousands, eh? That seems high. The maximum number of segments should be

RE: Does Lucene really work with Java 1.1.8

2001-10-09 Thread Doug Cutting
From: Brook, James [mailto:[EMAIL PROTECTED]] I am trying to use the 'lucene-1.2-rc1.jar' with a WebObjects 4.5 application, but having problems. WebObjects uses Java 1.1.8. I read on the jGuru Lucene FAQ that Lucene should work with this version of Java. Is this correct? It should,

<    1   2   3   4