MergerIndex + Searchables

2004-12-21 Thread Karthik N S
Hi Guys Apologies... I have several MERGERINDEXES [ MGR1,MGR2,MGR3]. for searching across these MERGERINDEXES I use the following Code IndexSearcher[] indexToSearch = new IndexSearcher[CNTINDXDBOOK]; for(int all=0;allCNTINDXDBOOK;all++){ indexToSearch[all] = new

Synonyms for AND/OR/NOT operators

2004-12-21 Thread Sanyi
Hi! What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. Thank you for your attention! Sanyi

Re: MergerIndex + Searchables

2004-12-21 Thread Nader Henein
As obvious as it may seem, you could always store the index ID in which you are indexing the document in the document itself and have that fetched with the search results, or is there something stopping you from doing that. Nader Henein Karthik N S wrote: Hi Guys Apologies... I

Re: index size doubled?

2004-12-21 Thread Paul Elschot
On Tuesday 21 December 2004 05:49, aurora wrote: I'm testing the rebuilding of the index. I add several hundred documents, optimize and add another few hundred and so on. Right now I have around 7000 files. I observed after the index gets to certain size. Everytime after optimize, the

Re: MergerIndex + Searchables

2004-12-21 Thread Paul Elschot
Karthik, On Tuesday 21 December 2004 09:04, Karthik N S wrote: Hi Guys Apologies... I have several MERGERINDEXES [ MGR1,MGR2,MGR3]. for searching across these MERGERINDEXES I use the following Code IndexSearcher[] indexToSearch = new IndexSearcher[CNTINDXDBOOK];

Re: Synonyms for AND/OR/NOT operators

2004-12-21 Thread Erik Hatcher
On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There are two options that I know of: 1)

Re: Synonyms for AND/OR/NOT operators

2004-12-21 Thread Sanyi
Hi! I think we're talking about different things. My question is about using synonyms for AND/OR/NOT operators, not about synonyms of words in the index. For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT So, the user can enter: (cat OR kitty) AND black AND tail and either:

Re: Synonyms for AND/OR/NOT operators

2004-12-21 Thread Morus Walter
Erik Hatcher writes: On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There

Re: Synonyms for AND/OR/NOT operators

2004-12-21 Thread Erik Hatcher
Wow, I really did misunderstand. My apologies. Yes, you will need to fork QueryParser.jj and install JavaCC to build your custom parser. It should be pretty trivial to add alternatives to AND(+)/OR/NOT(-). Erik On Dec 21, 2004, at 4:42 AM, Sanyi wrote: Hi! I think we're talking about

Lucene index files from two different applications.

2004-12-21 Thread Gururaja H
Hi ! Have two applications. Both are supposed to write Lucene index files and the WebApplication is supposed to read these index files. Here are the questions: 1. Can two applications write index files, in the same directory, at the same time ? 2. If two applications cannot write index

Re: Lucene index files from two different applications.

2004-12-21 Thread Sergiu Gordea
Gururaja H wrote: Hi ! Have two applications. Both are supposed to write Lucene index files and the WebApplication is supposed to read these index files. Here are the questions: 1. Can two applications write index files, in the same directory, at the same time ? if you implement the

Re: Lucene index files from two different applications.

2004-12-21 Thread Erik Hatcher
On Dec 21, 2004, at 5:51 AM, Gururaja H wrote: 1. Can two applications write index files, in the same directory, at the same time ? If you mean to the same Lucene index, the answer is no. Only a single IndexWriter instance may be writing to an index at one time. 2. If two applications cannot

Re: Synonyms for AND/OR/NOT operators

2004-12-21 Thread Sanyi
Well, I guess I'd better recognize and replace the operator synonyms to their original format before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code. Anyway, thanx for the answers. Sanyi --- Morus Walter [EMAIL PROTECTED] wrote: Erik Hatcher writes:

Re: Synonyms for AND/OR/NOT operators

2004-12-21 Thread Morus Walter
Sanyi writes: Well, I guess I'd better recognize and replace the operator synonyms to their original format before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code. Apart from knowing how to compile lucene (including the javacc code generation) you

Re: index size doubled?

2004-12-21 Thread Otis Gospodnetic
Another possibility is that you are using an older version of Lucene, which was known to have a bug with similar symptoms. Get the latest version of Lucene. You shouldn't really have multiple .cfs files after optimizing your index. Also, optimize only at the end, if you care about indexing

sorting on a field that can have null values (resend)

2004-12-21 Thread Praveen Peddi
I sent this mail yesterday but had no luck in receiving responses. Trying it again . Hi all, I am getting null pointer exception when I am sorting on a field that has null value for some documents. Order by in sql does work on such fields and I think it puts all results with null

Lucene working with a DB

2004-12-21 Thread Daniel Cortes
I read a lot of messages that Lucene can index a DB because it use that INPUTSTREAM type I don't understand how to do this. For example if I've a forum with Mysql and a lot of files on my web, for every search I've to select the index that I want use in my search, true? But I don't know how to

Stopwords in phrases

2004-12-21 Thread Ravi
I want to be able to use stopwords in exact phrase searches. I have looked at Nutch and used the same approach (replace common words with n-grams. Look at net.nutch.analysis.CommonGrams). So if to,be,or and not are stop words, for the string to be or not to be, the analyzer produces the

Re: Lucene working with a DB

2004-12-21 Thread Erik Hatcher
On Dec 21, 2004, at 10:39 AM, Daniel Cortes wrote: I read a lot of messages that Lucene can index a DB because it use that INPUTSTREAM type Where have you read that? This is incorrect. I don't understand how to do this. For example if I've a forum with Mysql and a lot of files on my web, for

Re: Stopwords in phrases

2004-12-21 Thread Erik Hatcher
On Dec 21, 2004, at 10:41 AM, Ravi wrote: I want to be able to use stopwords in exact phrase searches. I have looked at Nutch and used the same approach (replace common words with n-grams. Look at net.nutch.analysis.CommonGrams). So if to,be,or and not are stop words, for the string to be or

RE: Lucene index files from two different applications.

2004-12-21 Thread Chuck Williams
Depending on what you are doing, there are some problems with MultiSearcher. See http://issues.apache.org/bugzilla/show_bug.cgi?id=31841 for a description of the issues and possible patch(es) to fix. Chuck -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent:

Re: Lucene working with a DB

2004-12-21 Thread [EMAIL PROTECTED]
Hello I'll just paste the relevant MySQL code, you add the calls to it per your needs..it has no checking of anything so better add that as well... It's possible I didnt copy/paste everything but you should get the idea where this is going... -pedja -- import

Re: index size doubled?

2004-12-21 Thread aurora
Thanks for the heads up. I'm using Lucene 1.4.2. I tried to do optimize() again but it has no effect. Adding a just tiny dummy document would get rid of it. I'm doing optimize every few hundred documents because I tried to simulate incremental update. This lead to another question I would

how often to optimize?

2004-12-21 Thread aurora
Right now I am incrementally adding about 100 documents to the index a day and then optimize after that. I find that optimize essentially rebuilding the entire index into a single file. So the size of disk write is proportion to the total index size, not to the size of documents

RE: Stopwords in phrases

2004-12-21 Thread Ravi
Are you also using the position increment of 0 for the gram tokens like Nutch does? Yes. I don't think considering only gram tokens will work for me because Nutch uses only bi-grams. It can only have one gram per token. In my case I have more than one and even if I get only the grams, I still

Re: how often to optimize?

2004-12-21 Thread Otis Gospodnetic
Hello, I think some of these questions my be answered in the jGuru FAQ So my question is would it be an overkill to optimize everyday? Only if lots of documents are being added/deleted, and you end up with a lot of index segments. Is there any guideline on how often to optimize?

[ANNOUNCE] dotLucene1.4.3 RC1 (port of Jakarta Lucene to C#)

2004-12-21 Thread George Aroush
Hi Folks, I am pleased to announce the availability of dotLucene 1.4.3 RC1 build-001 This is the first Release Candidate release of version 1.4.3 of Jakarta Lucene ported to C# and is intended to be Final. Please visit http://www.sourceforge.net/projects/dotlucene/ to learn more about dotLucene