How to determine result quality

2011-11-14 Thread Samuel García Martínez
Hi list, I have been searching about score normalization few days (now i know this can't be done) in Lucene using this list, wiki, blogposts, etc. I'm going to expose my problem because I'm not sure that score normalization is what our project need. *Background*: In our project, we are using So

Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In Action" I learned that we should "warm up" IndexSearcher and donot expect initial a few queries to be fast. But due to our special app we cannot "warm up" (each query has to use a new IndexSearcher), in lucene 2.3.2 this se

Re: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Earl Hood
On Mon, Nov 14, 2011 at 11:09 AM, Zhang, Lisheng wrote: > We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In > Action" I learned > that we should "warm up" IndexSearcher and donot expect initial a few queries > to be fast. Make sure to QA things first. When we went from 2.4

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
Thanks very much for reminding, yes, we always rebuild index when upgrading lucene. -Original Message- From: earlh...@gmail.com [mailto:earlh...@gmail.com]On Behalf Of Earl Hood Sent: Monday, November 14, 2011 11:54 AM To: java-user@lucene.apache.org Subject: Re: Upgrade lucene from 2.3.2

Index Corruption with Lucene 2.9.3

2011-11-14 Thread nishesh.gupta
We are seeing Index corruption very often with version 2.9.3. Our indexing process is on Linux ( centos 5 ). Index is created on a mounted drive which is a shared drive from Windows 2008 server running in a VM. We generally see index corruption in merge or optimize after indexing runs continuous

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Uwe Schindler
Hi, > We plan to upgrade lucene from 2.3.2 to 3.1.0, from reading "Lucene In Action" > I learned that we should "warm up" IndexSearcher and donot expect initial a > few queries to be fast. This was always the case, not only since Lucene 2.9/3.0. You should warm your searchers. > But due to our

RE: Index Corruption with Lucene 2.9.3

2011-11-14 Thread Uwe Schindler
Hi, In general it's a bad idea to use Lucene on network-mounted drives. E.g., NFS is heavily broken with the file locking used by Lucene (NIO does not work at all, and file-based lock support fails because directory updated may not be visible at all times, or are visible before files are flushed -

RE: Index Corruption with Lucene 2.9.3

2011-11-14 Thread Uwe Schindler
One addition: Maybe you should update your antique Java version from the year 2007 (1.6.0_02) to something more up-to-date and maybe use 64 bit with mmap on a local filesystem for such a large index. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@theta

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
Thanks for your reply! The reason why we cannot reuse IndexReader is that our server holds many (>4000) independent index folders, each one corresponds to a separate URL. At any time any folder can be queried, so we cannot hold all of them into memory. In lucene 2.3.2 query is fast even if we rec

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Uwe Schindler
> Thanks for your reply! Thanks :-) > The reason why we cannot reuse IndexReader is that our server holds many > (>4000) independent index folders, each one corresponds to a separate URL. At > any time any folder can be queried, so we cannot hold all of them into > memory. So I expect the indexe

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
Our indexed data are around 200~300MB size (each folder), so it is still small? Could you roughly estimate how big the indexed data size (10GB?) needs to be, so that creating IndexReader each time could become a serious issue? Thanks very much for helps! Lisheng -Original Message- Fro

heads up -- reindex trunk indices

2011-11-14 Thread Michael McCandless
There was a sneaky bug, only in trunk (to be 4.0): https://issues.apache.org/jira/browse/LUCENE-3575 ... that causes field names to sometimes be silently wrong, for stored fields and term vectors, if you use addIndexes or you carried over a 3.x index. For example, you retrieve a stored doc t

RE: Index Corruption with Lucene 2.9.3

2011-11-14 Thread nishesh.gupta
Thanks Uwe for your comments. Few points to note for our setup - 1) At any time only one thread will be adding index and merging with the final index. Two threads will not concurrently be doing addindex and merge. 2) In the current setup where I am seeing the corruption, only one process is work

Re: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Erick Erickson
It's hard to estimate this in the abstract, I'm afraid you'll just have to try it. Best Erick On Mon, Nov 14, 2011 at 6:40 PM, Zhang, Lisheng wrote: > Our indexed data are around 200~300MB size (each folder), so it is > still small? > > Could you roughly estimate how big the indexed data size (1

RE: Upgrade lucene from 2.3.2 to 3.1.0

2011-11-14 Thread Zhang, Lisheng
Yes, your point made very good sense, thanks very much for helps! Lisheng -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, November 14, 2011 5:53 PM To: java-user@lucene.apache.org Subject: Re: Upgrade lucene from 2.3.2 to 3.1.0 It's hard to estimat