Re: IndexReader delete

2008-12-17 Thread Ganesh
Any opinion on this. - Original Message - From: "Ganesh" To: Sent: Wednesday, December 17, 2008 4:28 PM Subject: IndexReader delete When i perform a delete, i am getting the following exception org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock

Re: What are the best document edit options?

2008-12-17 Thread Chris Hostetter
: In-Reply-To: <13158.43731...@web45306.mail.sp1.yahoo.com> : Subject: What are the best document edit options? http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead

Re: Unique results in BooleanQuery

2008-12-17 Thread Chris Hostetter
: Let me expound more on the question. Will the q1 be run on the : BooleanQuery q2 and append the results that are not equal to the result : of the first query of q2? i really have no idea what you mean by: "q1 be run on the BooleanQuery q2" the query structure suggested will ensure that you o

Re: What are the best document edit options?

2008-12-17 Thread Thomas J. Buhr
Steve, Thanks for the helpful information, the addition of the new document methods makes things much better. One more question, is there JSON support in Lucene? JSON is more fat- free compared to XML and would be preferred. Digester works well for indexing XML but something along the same

Re: double metaphone for misspellings

2008-12-17 Thread Daniel Noll
Geoff Hendrey wrote: ((POINameType)name).getText().split("\\s"); //tokenize manually. (gosh, I thought the analyser would do this) The analyser does do this... but related to this, the Right Way to do it in your case would be to write your own analyser specifically for that field, and do all

Re: Combining results of multiple indexes

2008-12-17 Thread Preetham Kajekar
Thanks Erick and Michael. I will try out these suggestions and post my findings. ~preetham Erick Erickson wrote: Well, maybe if I'd read the original post more carefully I'd have figured that out, sorry 'bout that. I *think* I remember reading somewhere on the email lists that your indexing sp

double metaphone for misspellings

2008-12-17 Thread Geoff Hendrey
Apache commons codec library has double metaphone algorithm. I tried a series of experiments around storing the double metaphone representations of strings in the index itself, and searching using doublemetaphone version of search terms when the field I am searching against is stored as double meta

advice on DoubleMetaphoneSearching

2008-12-17 Thread Geoff Hendrey
Hi, I would like to have a Phrase Query in which the Terms are matched using the DoubleMetaphone algorithm.I found this link: http://www.tropo.com/techno/java/lucene/metaphone.html Which describes a DoubleMetaphoneQuery, and indeed this query works amazingly well for misspellings, but only for

addIndexesNoOptimize question

2008-12-17 Thread Antony Bowesman
The javadocs state "This requires ... and the upper bound* of those segment doc counts not exceed maxMergeDocs." Can one of the gurus please explain what that means and what needs to be done to find out whether an index being merged fits that criteria. Thanks Antony

Re: lucene 2.4 sorting slowness

2008-12-17 Thread Erick Erickson
Are you measuring only the time to execute the searcher.search line or are you measuring the time it takes to iterate the Hits object? The reason I ask is that something like for (int idx = 0; idx < hits.length(); ++idx) { } will re-execute the query every 100 documents examined or so. For ex

RESOLVED: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-17 Thread 1world1love
Just an FYI in case anyone runs into something similar. Essentially I had indexes that I have been searching from a java stored procedure in Oracle without issue for awhile. All of a sudden, I started getting the error I alluded to above when there were more than a certain number of terms (4,5, o

Re: lucene 2.4 sorting slowness

2008-12-17 Thread Michael McCandless
Are you warming the searcher first, and then testing the sort performance? (The first query is slow because it populates the FieldCache, internally, which is then reused for subsequent queries as long as you don't close that reader/searcher). Mike Chris Salem wrote: Hello, I have an i

lucene 2.4 sorting slowness

2008-12-17 Thread Chris Salem
Hello, I have an index with ~400 documents and some 200 fields. Searching without sorting takes around 300 - 500 ms, when sorting on dates (formated as '-mm-dd') searching time takes on average 15 seconds. Here's the code that does the search: hits = searcher.search(query, new Sort(new

RE: What are the best document edit options?

2008-12-17 Thread Steven A Rowe
Hi Thomas, On 12/17/2008 at 11:52 AM, Thomas J. Buhr wrote: > Where can I see how IndexWriter.updateDocument works without getting > into Lucene all over again until this important issue is resolved? > Is there a sample of its usage for updating specific fields in a > given document? The updateDo

Re: How to search documents taking in account the dates ???

2008-12-17 Thread Ariel
Hi: This solution have a problem. the results are sorted bye the year criteria but I need that after sort by year criteria it sort by the scoring criteria two. How can I do this ??? I hope you can help me. Greetings Ariel On Wed, Nov 19, 2008 at 5:28 PM, Erick Erickson wrote: > Well, MultiSearch

Re: What is speeding up repeated searches?

2008-12-17 Thread Yonik Seeley
On Wed, Dec 17, 2008 at 12:49 PM, Annette Tisdale wrote: > I've noticed in our lucene app that subsequent identical searches are faster > than the first search. So if I search for "things you know" the first > response time will be 160ms, the second will be 23ms. Then if I search for > "something

Re: What are the best document edit options?

2008-12-17 Thread Erick Erickson
I'll leave those details to the experts who are up to speed . On Wed, Dec 17, 2008 at 11:52 AM, Thomas J. Buhr wrote: > Erick, > > Thanks for the good news, my question was still lingering from months ago > when I initially looked at an older Lucene. > > Now I need a bit more specific info, since

Re: Combining results of multiple indexes

2008-12-17 Thread Erick Erickson
Well, maybe if I'd read the original post more carefully I'd have figured that out, sorry 'bout that. I *think* I remember reading somewhere on the email lists that your indexing speed goes up pretty linearly as the number of indexing tasks approaches the number of CPUs. Are you, perhaps, on a dua

What is speeding up repeated searches?

2008-12-17 Thread Annette Tisdale
I've noticed in our lucene app that subsequent identical searches are faster than the first search. So if I search for "things you know" the first response time will be 160ms, the second will be 23ms. Then if I search for "something else" the first response time will be 133ms and the second will b

Re: Order of fields returned by Document.getFields()

2008-12-17 Thread Yonik Seeley
On Wed, Dec 17, 2008 at 10:32 AM, Patrick Johnstone wrote: > As I said in the original email, my issue is that I don't > think Lucene is returning the fields in the original order > anymore. Hmmm, you're right. http://wiki.apache.org/jakarta-lucene/LuceneFAQ states " What is the order of field

Re: What are the best document edit options?

2008-12-17 Thread Thomas J. Buhr
Erick, Thanks for the good news, my question was still lingering from months ago when I initially looked at an older Lucene. Now I need a bit more specific info, since much in my architecture rests on this ability to modify document fields dynamically. Where can I see how IndexWriter.upda

Re: Combining results of multiple indexes

2008-12-17 Thread Michael McCandless
Have you tested your indexing throughput with two threads sharing one IndexWriter (one index)? Mike Preetham Kajekar wrote: Hi Erick, Thanks for the response. Replies inline. Erick Erickson wrote: The very first question is always "are you opening a new searcher each time you query"? But

Re: Combining results of multiple indexes

2008-12-17 Thread Preetham Kajekar
Hi Erick, Thanks for the response. Replies inline. Erick Erickson wrote: The very first question is always "are you opening a new searcher each time you query"? But you've looked at the Wiki so I assume not. This question is closely tied to what kind of latency you can tolerate. A few more deta

RE: Order of fields returned by Document.getFields()

2008-12-17 Thread Patrick Johnstone
> -Original Message- > From: Yonik Seeley [mailto:ysee...@gmail.com] > Sent: Wednesday, December 17, 2008 10:07 AM > To: java-user@lucene.apache.org > Subject: Re: Order of fields returned by Document.getFields() > > Lucene guarantees the order of all stored fields returned. > Solr gua

Re: IDF scoring issue

2008-12-17 Thread Grant Ingersoll
On Dec 17, 2008, at 9:26 AM, Rajiv2 wrote: Because, the search term is provided by a user, and that user would explicity have to put quotes around "marietta ga" when I beleive the search text as it is : fleming roofing inc., marietta ga -- should score higher for "marietta ga" Just

Re: Combining results of multiple indexes

2008-12-17 Thread Erick Erickson
The very first question is always "are you opening a new searcher each time you query"? But you've looked at the Wiki so I assume not. This question is closely tied to what kind of latency you can tolerate. A few more details, please. What's slow? Queries? Indexing? How slow? 100ms? 100s? What ar

Re: Returning hits by highest score

2008-12-17 Thread Chris Bamford
Thanks Danil - I'd missed that. Danil ŢORIN wrote: According to http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/TopDocCollector.html it does. After search, simple retrieve TopDocs and read documens you need: List result = new ArrayList(10); for( ScoreDoc sDoc :collector.topDo

Re: IDF scoring issue

2008-12-17 Thread Matthew Hall
Well, you could also do a simple test of removing IDF from the scoring equation and seeing if the query then reacts the way you want it to. Simply write your own custom similarity that does this, and test out to see how it works. Handily enough, I've already done this, so here's some code you

Re: Order of fields returned by Document.getFields()

2008-12-17 Thread Yonik Seeley
Lucene guarantees the order of all stored fields returned. Solr guarantees the order of all values in a *specific* field, but not the fields themselves. -Yonik On Tue, Dec 16, 2008 at 10:00 AM, Patrick Johnstone wrote: > > I'm using Lucene via Solr and recently upgraded from an early Summer nig

RE: Order of fields returned by Document.getFields()

2008-12-17 Thread Patrick Johnstone
> > > > I'm using Lucene via Solr and recently upgraded from an > early Summer > > nightly build to the released version of Solr 1.3 (which > seems to use > > something in the neighborhood of Lucene 2.3). I'm posting > this here > > because I believe that my issue is with Lucene, not Solr

Re: Combining results of multiple indexes

2008-12-17 Thread Preetham Kajekar
Hi Grant, Thanks four response. Replies inline. Grant Ingersoll wrote: On Dec 17, 2008, at 12:57 AM, Preetham Kajekar wrote: Hi, I am new to Lucene. I am not using it as a pure text indexer. I am trying to index a Java object which has about 10 fields (like id, time, srcIp, dstIp) - most of

Re: IDF scoring issue

2008-12-17 Thread Rajiv2
Because, the search term is provided by a user, and that user would explicity have to put quotes around "marietta ga" when I beleive the search text as it is : fleming roofing inc., marietta ga -- should score higher for "marietta ga" rajiv Grant Ingersoll-6 wrote: > > > On Dec 16, 2008, at

Re: What are the best document edit options?

2008-12-17 Thread Erick Erickson
What version of Lucene are you using? The more recent ones have IndexWriter.updateDocument.. Best Erick On Wed, Dec 17, 2008 at 2:20 AM, Thomas J. Buhr wrote: > Hello Lucene, > > Looking at the document object it seems like each time I want to edit its > contents I need to do the following:

Re: Combining results of multiple indexes

2008-12-17 Thread Grant Ingersoll
On Dec 17, 2008, at 12:57 AM, Preetham Kajekar wrote: Hi, I am new to Lucene. I am not using it as a pure text indexer. I am trying to index a Java object which has about 10 fields (like id, time, srcIp, dstIp) - most of them being numerical values. In order to speed up indexing, I figured t

Re: Unique results in BooleanQuery

2008-12-17 Thread Erick Erickson
You could also think about a filter. Just run q1 as a regular query. Use one of the Collector methods to create a Filter. At the end, invert the Filter and use it as a parameter for your second query. Best Erick On Wed, Dec 17, 2008 at 12:23 AM, Jay Malaluan wrote: > > Hi, > > Anyone knowledgeab

Re: Order of fields returned by Document.getFields()

2008-12-17 Thread Grant Ingersoll
On Dec 16, 2008, at 10:00 AM, Patrick Johnstone wrote: I'm using Lucene via Solr and recently upgraded from an early Summer nightly build to the released version of Solr 1.3 (which seems to use something in the neighborhood of Lucene 2.3). I'm posting this here because I believe that m

Re: TopDocs - Get all docs?

2008-12-17 Thread Michael McCandless
It might be faster to use FieldCache.DEFAULT.getStrings(reader, "empid"), assuming empid is indexed but is not analyzed (or always analyzes to one token). Though, that then persists the resulting array in the FieldCache. We are wanting to create "column stride fields" (LUCENE-1231) to make

Re: Returning hits by highest score

2008-12-17 Thread Michael McCandless
Right, it returns the best 10 documents by score (not the first 10 docs it sees). You could also simply use the search(Query, int) method too (which just creates the TopDocCollector under the hood). Mike Danil ŢORIN wrote: According to http://lucene.apache.org/java/2_4_0/api/org/apach

Re: IDF scoring issue

2008-12-17 Thread Grant Ingersoll
On Dec 16, 2008, at 8:19 PM, Rajiv2 wrote: Hello, I'm using the default lucene Queryparser on the search text : fleming roofing inc., marietta ga Also, I don't want to modify the search text by putting quotes around "marietta ga" which forces the query parser to make a phrase query. Why no

Re: Returning hits by highest score

2008-12-17 Thread Danil ŢORIN
According to http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/TopDocCollector.html it does. After search, simple retrieve TopDocs and read documens you need: List result = new ArrayList(10); for( ScoreDoc sDoc :collector.topDocs().scoreDocs) { result.add(contentSearcher.doc(s

Re: TopDocs - Get all docs?

2008-12-17 Thread Donna L Gresh
Thanks- Yes in my use-case there are never any deleted documents when the search is run- (deletion takes place in a pre-processing stage) Toke Eskildsen wrote on 12/17/2008 08:16:31 AM: > On Mon, 2008-12-08 at 15:17 +0100, Donna L Gresh wrote: > > public Vector getIndexIds() throws Exce

Persian (Farsi) Language Analyzer

2008-12-17 Thread Ian Vink
I have ported the Java version of the Arabic analyzer recently committed to Lucene.Net Is there any work been done on a Farsi Analyzer (Persian Language) Thanks, Ian

Re: TopDocs - Get all docs?

2008-12-17 Thread Toke Eskildsen
On Mon, 2008-12-08 at 15:17 +0100, Donna L Gresh wrote: > public Vector getIndexIds() throws Exception { > > Vector vec = new Vector(); > IndexReader ireader = IndexReader.open(directoryName); > int numdocs = ireader.numDocs(); >

Returning hits by highest score

2008-12-17 Thread Chris Bamford
Hi In a search I am doing, there may be thousands of hits, of which I only want the 10 with the highest score. Will the following code do this for me, or will it simply return the first 10 it finds? TopDocCollector collector = new TopDocCollector(10); contentSearcher.search(q, collector); If

IndexReader delete

2008-12-17 Thread Ganesh
When i perform a delete, i am getting the following exception org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/ org.apache.lucene.store.Lock.obtain(Lock.java:85) org.apache.lucene.index.DirectoryIndexReader.acquireWriteLock(DirectoryIndexR

Re: Inquiry on Lucene Stemming

2008-12-17 Thread Jokin Cuadrado
Well, you could use the queryparser wildcard searches (flash*), but it doesn't use stemming logic, it just returns all the words that start with that string. You must be aware that the queryparser rewrite the query with every term that match the wildcard, so if your prefix is short it's easy to g