Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Dawid Weiss
I'm pretty sure it doesn't solve the problem in general (it isn't a thread-save solution for sure, you mentioned the memory barrier, I'd add compiler optimizations). If it works it must be something application-specific, maybe synchronization isn't really needed there, or you just don't do an

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Yonik Seeley
> We've been using this in production for a while and it fixed the > extremely slow searches when there are deleted documents. Who was the caller of isDeleted()? There may be an opportunity for an easy optimization to grab the BitVector and reuse it instead of repeatedly calling isDeleted() on the

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Yonik Seeley
I'm not sure that looks like a safe patch. Synchronization does more than help prevent races... it also introduces memory barriers. Removing synchronization to objects that can change is very tricky business (witness the double-checked locking antipattern). -Yonik Now hiring -- http://tinyurl.com/

Re:Lucene and Quartz

2005-10-11 Thread javabuddy
Thanks for the reply. Here is what happens... I have 2 boxes A and B. And the indices are created on Machine C. The directory of the index is mounted on both the machines A and B. We have quartz using JDBCJobStore. So index creation runs on either one of the box. SO when index creation job is

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Chris Lamprecht
Hi Peter, I observed the same issue on a multiprocessor machine. I included a small fix for this in the NIO patch (against the 1.9 trunk) here: http://issues.apache.org/jira/browse/LUCENE-414#action_12322523 The change amounts to the following methods in SegmentReader.java, to remove the need s

Re: wildcards within a phrase query

2005-10-11 Thread Daniel Naber
On Mittwoch 12 Oktober 2005 00:15, Robert Watkins wrote: > Wonderful! But what about wildcards? I realised after I had sent the > last message that my pattern should have been written: Have a look at the test cases: you need to expand the terms yourself, i.e. it doesn't matter if there's a prefi

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Peter Keegan
> If the index is in 'search/read-only' mode, is there a way around this bottleneck? The obvious answer (to answer my own question) is to optimize the index. But the question remains: why is the docMap created and never used? Peter

Re: wildcards within a phrase query

2005-10-11 Thread Robert Watkins
Wonderful! But what about wildcards? I realised after I had sent the last message that my pattern should have been written: ( term | term as prefix | wildcard term )+ -- Robert On Tue, 11 Oct 2005, Daniel Naber wrote: On Dienstag 11 Oktober 2005 22:53, Robert Watkins wrote: I was under th

Re: "docMap" array in SegmentMergeInfo

2005-10-11 Thread Peter Keegan
On a multi-cpu system, this loop to build the docMap array can cause severe thread thrashing because of the synchronized method 'isDeleted'. I have observed this on an index with over 1 million documents (which contains a few thousand deleted docs) when multiple threads perform a search with either

Re: wildcards within a phrase query

2005-10-11 Thread Daniel Naber
On Dienstag 11 Oktober 2005 22:53, Robert Watkins wrote: > I was under the impression that PhrasePrefixQuery only worked in the > special case of the term that would otherwise be used in a PrefixQuery > coming at the end of the sequence of terms, as in: No, the test cases show that the prefix ter

Re: wildcards within a phrase query

2005-10-11 Thread Robert Watkins
I was under the impression that PhrasePrefixQuery only worked in the special case of the term that would otherwise be used in a PrefixQuery coming at the end of the sequence of terms, as in: ( term )+ ( term as prefix ) but not where either a WildcardQuery or a PrefixQuery occurs anywhere in t

Re: Lucene and Quartz

2005-10-11 Thread Nader Henein
1) A FileNotFound Exception isn't a Lucene issue as much as it's a file system issue, which file is "not found"? What's in the logs 2) As for simultaneous indexing on two seperates indecies, there should be absolutly no problem, we simultaneously index 10 parallel indecies using quartz and it'

Re: query across fields?

2005-10-11 Thread Marc Hadfield
thanks again! Doug Cutting wrote: Marc Hadfield wrote: In the SpanNear (or for that matter PhraseQuery), one can set a slop value where 0 (zero) means one following after the other. How can one differentiate between Terms at the **same** position vs. one after the other? The followi

RE: Hits sorted

2005-10-11 Thread Mordo, Aviran (EXP N-NANNATEK)
Just use the Sort option in the searcher http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Searcher .html#search(org.apache.lucene.search.Query,%20org.apache.lucene.search. Sort) Aviran http://www.aviransplace.com -Original Message- From: Daniel Cortes [mailto:[EMAIL PROTECT

Re: query across fields?

2005-10-11 Thread Doug Cutting
Marc Hadfield wrote: In the SpanNear (or for that matter PhraseQuery), one can set a slop value where 0 (zero) means one following after the other. How can one differentiate between Terms at the **same** position vs. one after the other? The following queries only match "x" and "y" at the sa

Hits sorted

2005-10-11 Thread Daniel Cortes
Hi everybody, I have a problem when I find all the documents added in the last days in my index. It works good but I want show this results sorted. What I have to do? My code is this: private RangeQuery findINTODates(int days) { Term from; Term to; Calendar calendar =

RE: One index or 2 indices

2005-10-11 Thread Mordo, Aviran (EXP N-NANNATEK)
Well there isn't really much difference. If you have large amount of data then I would suggest 2 indexes, but not then one index will work too. HTH Aviran http://www.aviransplace.com -Original Message- From: Sharma, Siddharth [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 11, 2005 2:

One index or 2 indices

2005-10-11 Thread Sharma, Siddharth
Hiya Given that I have two high level business entities, catalog (containing product information) and contract (containing filter criteria about which products are available for sale and which are not), what is a better approach? 1. To have two different indices and query them separately. OR 2. H

Re: wildcards within a phrase query

2005-10-11 Thread Daniel Naber
On Dienstag 11 Oktober 2005 15:32, Robert Watkins wrote: > The only idea that comes to mind is to try to combine a PhraseQuery and > a PrefixQuery Yes, PhrasePrefixQuery already supports that. Regards Daniel -- http://www.danielnaber.de ---

Re: query across fields?

2005-10-11 Thread Marc Hadfield
Hello - a quick follow-up to my previous post. In the SpanNear (or for that matter PhraseQuery), one can set a slop value where 0 (zero) means one following after the other. How can one differentiate between Terms at the **same** position vs. one after the other? ie: (Token)/Position (A)

RE: Can't find record when I'm sure I should

2005-10-11 Thread Mordo, Aviran (EXP N-NANNATEK)
You might want to check your analyzer, it might trims or ignore these names. Aviran http://www.aviransplace.com -Original Message- From: Dan Quaroni [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 11, 2005 2:22 PM To: java-user@lucene.apache.org Subject: Can't find record when I'm sure

Re: Can't find record when I'm sure I should

2005-10-11 Thread Chris Hostetter
Try Luke, see exactly what is indexed for these companies. : Date: Tue, 11 Oct 2005 14:21:54 -0400 : From: Dan Quaroni <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Can't find record when I'm sure I should : : I have a set of indexes co

Can't find record when I'm sure I should

2005-10-11 Thread Dan Quaroni
I have a set of indexes containing business information (name, address, phone, etc). There are a couple particular companies that don't come up when people search for them. I've used our debugging app that allows lucene queries to be executed directly, and I have confirmed this. I can find t

Re: Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Marvin Humphrey
On Oct 11, 2005, at 10:04 AM, Hugo Lafayette wrote: First of all, add maybe I make a false assumption here, but if you strip leading "j'", "t'" and so on, that means that if you make a search like: +text:"il m'aime" you will get documents with the sentence "il m'aime" (french for "he lov

Re: Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Hugo Lafayette
Marvin Humphrey wrote: > I'm curious: are there any cases in French where a string with an > apostrophe in it ought to be split into two searchable tokens? I > know of no such cases in English: you never want to search for the ll > in you'll, or the O in O'Reilly, etc. First of all, add ma

Re: Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Marvin Humphrey
On Oct 11, 2005, at 7:52 AM, Hugo Lafayette wrote: Why do not include that in the FrenchStemFilter "next()" method itself ? It will be a bad design ? I agree with your assessment. Conceptually, this is a stemming problem. By extension, it's not a tokenizing problem, and the behavior o

Lucene and Quartz

2005-10-11 Thread javabuddy
Hi, I have the indexing process running in an quartz environment. (on a clustered two boxes) I made sure that the Indexing doesnt runs simultaneously on both the boxes. But suddenly I am start getting "FileNotFoundException" on the indexing process. From that pont on the indexes are of no use.

Re: Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Erik Hatcher
On Oct 11, 2005, at 10:52 AM, Hugo Lafayette wrote: Erik Hatcher wrote: Rather than changing StandardAnalyzer, you could create a custom Analyzer that is something along the lines of StandardTokenizer -> custom apostrophe splitting filter -> ISOLatinFilter. Why do not include that in the

Re: Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Hugo Lafayette
Erik Hatcher wrote: > Rather than changing StandardAnalyzer, you could create a custom > Analyzer that is something along the lines of StandardTokenizer -> > custom apostrophe splitting filter -> ISOLatinFilter. Why do not include that in the FrenchStemFilter "next()" method itself ? It wil

Re: Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Erik Hatcher
On Oct 11, 2005, at 9:22 AM, Hugo Lafayette wrote: - accentuated characters: The french analyzer keep accents, which could be useful, but may also become boring. I just have to add the ISOLatinFilter.java to correct that, but maybe adding an option to keep them or not could be useful. - ap

wildcards within a phrase query

2005-10-11 Thread Robert Watkins
I've been trying to figure out the best way to support queries of the ilk: "going to he* in a hand-basket" such that it's almost a PhraseQuery, except that the third term (in this case) is a PrefixQuery. The only idea that comes to mind is to try to combine a PhraseQuery and a PrefixQuery (or

Bad behaviors of FrenchAnalyzer

2005-10-11 Thread Hugo Lafayette
Hi there, I just test the french analyzer, which works well for most part of it (Stemmer particulary). But ATM, I have two unexpected behavior with the default configuration: - accentuated characters: The french analyzer keep accents, which could be useful, but may also become boring. I just have

Re: Intersecting queries

2005-10-11 Thread Paul Libbrecht
search a query using a filter that's a query-filter ?? paul Le 11 oct. 05, à 13:36, Trond Aksel Myklebust a écrit : I need to be able to intersect the result of two queries based on a field "ID". So if I do a search: Content2 = "something totally" and a search: Content1 = "something" I wan

Intersecting queries

2005-10-11 Thread Trond Aksel Myklebust
I need to be able to intersect the result of two queries based on a field "ID". So if I do a search: Content2 = "something totally" and a search: Content1 = "something" I want to return only Document 2 based on the field ID being the same. Any tip on how to do this in Lucene, or should I go for

RE: Getting index lock while indexing

2005-10-11 Thread M å n i s h
Yes some time it creates lock file in tomcat. But nowadays i am not able to index even after deleting the lock files. I checked tomcat's temp folder and java.io.tmpdir , nothing is there. Even if I am not closing the index it should index after deleting the lock files , (Correct me if I am wrong

Re: Getting index lock while indexing

2005-10-11 Thread Oren Shir
Do you keep open IndexReader, IndexWriter or IndexSearcher? Try closing them suring shutdown On 9/29/05, M å n i s h <[EMAIL PROTECTED]> wrote: > > > Hi, > I am having trouble indexing files sometimes, > My application is deployed in tomcat and some times when I try to stop and > restart indexing

Re: What is MMapDirectory?

2005-10-11 Thread Koji Sekiguchi
Paul, Thank you very much for your explanation. However, in case you have different experience, we'd like to know. I don't have it. I'm just curious. Thank you again, Koji - Original Message - From: "Paul Elschot" <[EMAIL PROTECTED]> To: Sent: Tuesday, October 11, 2005 4:16 PM

Re: What is MMapDirectory?

2005-10-11 Thread Paul Elschot
Koji, On Sunday 09 October 2005 14:12, Koji Sekiguchi wrote: > Hello, > > What is MMapDirectory? > > I've searched mailing list archive, but cannot find it. > I could find the following explanation at Lucene 1.9 CHANGES.txt: > > 8. Add MMapDirectory, which uses nio to mmap input files. This i