Tuning Indexing performance question ..

2006-04-10 Thread Mufaddal Khumri
Hi, I am using a multi threaded app to index a bunch of Data. The app spawns X number of threads. Each thread writes to a RAMDirectory. When thread finishes it work, the contents from the RAMDirectory are written into the FSDirectory. All threads are passed an instance of the FSWriter when th

RE: Update or Delete Document for Lucene 1.4.x

2006-03-31 Thread Mufaddal Khumri
The way you update a document in lucene is by deleting the current one and adding a new one. -Mufaddal. -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED] Sent: Friday, March 31, 2006 1:37 PM To: java-user@lucene.apache.org Subject: Update or Delete Document for Lucene

RE: Regarding Indexes

2006-03-31 Thread Mufaddal Khumri
The solution to your problem lies in answers to many business domain specific questions like: 1. Will each company only want to carry out searches on their data or on ALL the data? 2. If you do not know the answer to that, is there a chance that the some companies would want to search only their

Re: Getting no hits ...

2006-02-23 Thread Mufaddal Khumri
Name:"eos\-20d"^80.0) I modified my searching code to use the standard analyzer, but i did not get any hits back. I am still trying to figure out the problem out. Any ideas? Mufaddal Khumri wrote: In my earlier email i put in the wrong query that I am searching on. The correct q

Re: Getting no hits ...

2006-02-23 Thread Mufaddal Khumri
s when you tokenize "ES-20D" ? 3) Have you tried a simpler query (ie: just "content:es\-20d" ) ? 4) When giving QueryParser a (quoted) phrase search, i don't think you really want to escape that "-" character. : Date: Thu, 23 Feb 2006 14:16:42 -0700 : From:

Getting no hits ...

2006-02-23 Thread Mufaddal Khumri
I have been trying to figure out why my query below would not return any hits. I use two custom analyzers for indexing and searching. The one I use for indexing uses this: public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new StandardTokenizer

hyphen not being removed by standard filter

2006-02-22 Thread Mufaddal Khumri
Hi, I might be missing something. I have a custom analyzer the gist of which is: public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result);

RE: ArrayIndexOutOfBoundsException being thrown ...

2006-02-22 Thread Mufaddal Khumri
I switched back to lucene-1.4.3.jar and i dont get the exception any more? Is this a bug in the new jar? -Mufaddal. -Original Message- From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] Sent: Wed 2/22/2006 10:20 AM To: java-user@lucene.apache.org Subject: ArrayIndexOutOfBoundsException

ArrayIndexOutOfBoundsException being thrown ...

2006-02-22 Thread Mufaddal Khumri
Getting an ArrayIndexOutOfBoundsException ... Line 31 in IndexSearcherManager.java: ... public static IndexSearcher getIndexSearcher(String indexPath) { logger.debug("indexPath = " + indexPath); searcher =

Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
Hi, Thats exactly what I am doing currently. Was just wondering if there is a lucene way to do what I am doing using QueryFilter etc. -Thanks. Dan Armbrust wrote: Mufaddal Khumri wrote: When I do a search for example on "batteries" i get 1200+ results. I would like to show the

Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
e. After all they may search for "bolt" maybe they want an ancillary product. -----Original Message- From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 21, 2006 12:06 PM To: java-user@lucene.apache.org Subject: Re: get results by relevance, limit

Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
So yes, if the xth + 1 item happens to be a camera and if its price happens to be lower than the previous x cameras it wont be included in this view and that is exactly what we want. Mufaddal Khumri wrote: In my case when we search for lets say cameras , my top x results are all sorts of

Re: get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
to sort on the full document list, and then return on the 300 top that you want the user to see. I think I'm just curious why getting rid of some that could (in a new sort) be of higher relevance is a good thing. -Original Message- From: Mufaddal Khumri [mailto:[EMAIL PROTECTE

get results by relevance, limiting results and then sort the results by some criterion

2006-02-21 Thread Mufaddal Khumri
When I do a search for example on "batteries" i get 1200+ results. I would like to show the user lets say 300. I can do that by only extracting the first 300 hits (sorted by decreasing relevance by default) and displaying those to the user. Now on the search results page, I have a drop down bo

Re: exact match ..

2006-02-20 Thread Mufaddal Khumri
lds: categoryNames < analyzed by keyword analyzer Is there a way I could have a single document object have some fields analyzed by my custom analyzer and the one field - "categoryNames" analyzed by the keyword analyzer? Thanks, Mufaddal Khumri wrote: Hi Steve, If I understand yo

Re: exact match ..

2006-02-20 Thread Mufaddal Khumri
eywordAnalyzer approach I can index the categoryNames field using this analyzer . Would I be using the QueryParser to create my query and specify the keyword analyzer to it while searching on categoryNames ? (and then make that query part of my global boolean query?) -Thanks. Steven Rowe wr

span first query and boosting ..

2006-02-20 Thread Mufaddal Khumri
Hi, I do this: SpanFirstQuery fullPhraseInCategoryNamesQuery = new SpanFirstQuery(new SpanTermQuery(new Term("categoryNames", "digital cameras")), 2); fullPhraseInCategoryNamesQuery.setBoost(8); In my log output i get this: spanFirst(categoryNames:digit camera, 2)) Why cant I boost a span q

exact match ..

2006-02-20 Thread Mufaddal Khumri
lets say i do this while indexing: doc.add(Field.Text("categoryNames", categoryNames)); Now while searching categoryNames, I do a search for "digital cameras". I only want to match the exact phrase digital cameras with documents who have exactly the phrase "digital cameras" in the categoryName

StandardAnalyzer question ...

2006-02-20 Thread Mufaddal Khumri
Hi, When StandardAnalyzer is used to index documents, arent the terms, amongst other things, lower cased and stored that ways in the index? I have a index field that I index like this: ramWriter = new IndexWriter(ramDir, standardAnalyzer, true); ... ... doc.add(Field.Text("categoryN

Re: StandardAnalyzer .. stemming

2006-02-17 Thread Mufaddal Khumri
Thank you. I think in my case i can just do the last approach you suggested. One more question, what jar is SnowballFilter part of? Chris Hostetter wrote: : The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on : the other hand has a bunch of other niceness. What is the best pr

StandardAnalyzer .. stemming

2006-02-17 Thread Mufaddal Khumri
The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on the other hand has a bunch of other niceness. What is the best practice of leveraging both these analyzers while indexing and searching? Do I chain these up somehow and if so what apis do i look at for doing so? Do i implemen

Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
gt; using with Luke? You have some capitalized words in your query, and : > most analyzers would lowercase those, which may be the issue (perhaps : > you indexed the capitalized words?). : > : > Erik : > : > On Feb 16, 2006, at 2:41 PM, Mufaddal Khumri wrote: : > : >> Hi,

Re: Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
those, which may be the issue (perhaps you indexed the capitalized words?). Erik On Feb 16, 2006, at 2:41 PM, Mufaddal Khumri wrote: Hi, I have a query that gets hits via luke. I can see the documents it finds. But when I run the same query via my java code it returns 0 hits. Note: 1

Strange Problem ... Luke returns results Lucene api does not.

2006-02-16 Thread Mufaddal Khumri
Hi, I have a query that gets hits via luke. I can see the documents it finds. But when I run the same query via my java code it returns 0 hits. Note: 1. I am using standard analyzer while indexing and searching. 2. I have made sure that I am querying the same index via luke or through my java

Lucene Query ... understanding

2006-02-16 Thread Mufaddal Khumri
Hi, Am just trying to see if i understand the lucene query below correctly. +(+contentNew:radio +contentNew:mp3) +entity:product +(name:radio mp3^4.0 (contentNew:radio contentNew:mp3) contentNew:radio mp3^2.0) Let me see if can understand the above query correctly: 1. the contentNew field ha

Index location

2005-08-29 Thread Mufaddal Khumri
Hi, I have been trying to control where lucene creates the search index for my web application. I am tweaking the following code in order to specify the location for the index, but it seems that lucene is creating the index in the location from where my CreateIndex.class is invoked. Here is the

de pluralization

2005-08-04 Thread Mufaddal Khumri
Hello, I am just posting this question out here since this might be a common problem and some of you might have good pointers. Is there algorithms/api built into lucene that would help de pluralize words while indexing and/or while searching the index? Are there analyzers that do this already? T

RE: Question regarding boosting

2005-05-20 Thread Mufaddal Khumri
represented as a blank in the query. Is that fine? The results from executing this query seem alright, but is this a good way of achieving the results I was trying to achieve? (NOTE: My original post explains what I am trying to do). Any insight would be appreciated. Mufaddal. -Original

RE: Indexing in multi-threaded environment

2005-05-03 Thread Mufaddal Khumri
Hi , The calls to the IndexWriter.addIndexes is synchronized. Your code should not have to do anything more than just calling it. I believe roughly this will be the scenario that you are looking for: - while(there is more data) - spawn a thread to handle creating documents for this data

RE: Lucene loosing documents?

2005-04-28 Thread Mufaddal Khumri
: Lucene loosing documents? Can you close the ramDirectory first and then add it via fsWriter and see if that solves it? Otis --- Mufaddal Khumri <[EMAIL PROTECTED]> wrote: > Hi, > > I am trying to index 20349 records. When I index using the > FSDirectory I > get 20349 docume

Lucene loosing documents?

2005-04-28 Thread Mufaddal Khumri
Hi, I am trying to index 20349 records. When I index using the FSDirectory I get 20349 documents - this is correct. Now when I ude the RAMDirectory to create my index and write all documents from the RAMDirectory to the FSDirectory I only get 20340 documents consistently. This is the only change I

RE: Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri
d it finishes in about twenty minutes so >you should be able to index 2 rows in a few seconds. > >Make sure your database table(s) are indexed appropriately according to >your select statements. Indexing correctly will be the biggest >performance improvement you will see. > >Bes

Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri
Hi, I am sure this question must be raised before and maybe it has been even answered. I would be grateful, if someone could point me in the right direction or give their thoughts on this topic. The problem: I have approximately over 2 products that I need to index. At the moment I get X num

Problem when searching ..

2005-04-15 Thread Mufaddal Khumri
Hi, I am creating an index of my data that's persisted by Hibernate using Lucene. I am running my indexer on a huge data set. My indexing takes 1312805ms. At the end of which I get a 26,266KB directory. I can view the contents of my index directory using Luke. When I copy my webapp under Tomcat

RE: Escaping special characters

2005-04-06 Thread Mufaddal Khumri
Analyzer? Thanks. -Original Message- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 06, 2005 11:39 PM To: java-user@lucene.apache.org Subject: Re: Escaping special characters Mufaddal Khumri writes (4/6/2005 11:21 PM): >Hi, > > > >Am new to Luce

Escaping special characters

2005-04-06 Thread Mufaddal Khumri
Hi, Am new to Lucene. I found the following page: http://lucene.apache.org/java/docs/queryparsersyntax.html. At the bottom of the page there is a section that in order to escape special characters one would use "\". I have an Indexer that indexes product names. Some product names have "-" c