Are there any Lucene optimizations applicable to SSD?

2008-08-19 Thread Cedric Ho
Hi all, We are testing Lucene with SSD. No doubt the performance is much better than that of a normal hard disk. However it's still not good enough for our particular case. So I wonder if there are any tips for optimizing lucene performance on SSDs. For example, I saw that Lucene's

Re: Simple Query Question

2008-08-19 Thread Ian Lea
No, lucene does not automatically replace spaces with AND. See http://lucene.apache.org/java/2_3_2/queryparsersyntax.html -- Ian. On Tue, Aug 19, 2008 at 1:34 AM, DanaWhite [EMAIL PROTECTED] wrote: For some reason I am thinking I read somewhere that if you queried something like: Eiffel

java.lang.NullPointerExcpetion while indexing on linux

2008-08-19 Thread Aditi Goyal
Hi All, I am using IndexWriter for adding the documents. I am re-using the document as well as the fields for improving index speed as per the link http://wiki.apache.org/lucene-java/ImproveIndexingSpeed. So, for each doc, i am first removing field using doc.removeField() and then

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-19 Thread Toke Eskildsen
On Tue, 2008-08-19 at 16:22 +0800, Cedric Ho wrote: [Lucene on SSD] However it's still not good enough for our particular case. So I wonder if there are any tips for optimizing lucene performance on SSDs. What aspect of performance do you find lacking? Is it searching or indexing? While

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-19 Thread Ian Lea
Hi I don't think you need to remove the field and then add it again, but I've no idea if that is relevant to your problem or not. A full stack trace would be more help, and maybe an upgrade to 2.3.2, and maybe a snippet of your code, and what is JCC? -- Ian. On Tue, Aug 19, 2008 at 10:09

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-19 Thread Michael McCandless
Ian Lea wrote: I don't think you need to remove the field and then add it again, but I've no idea if that is relevant to your problem or not. That's right: just leave the Field there and change its value (assuming the doc you are changing to still uses that field). A full stack trace

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-19 Thread Aditi Goyal
Thanks Michael and Ian for your valuable response. I am attaching a small default code. Please have a look and tell me where am I going wrong. import lucene from lucene import Document, Field, initVM, CLASSPATH doc = Document() fieldA = Field('fieldA', , Field.Store.YES,

Updating tag-indexes

2008-08-19 Thread Ivan Vasilev
Hi Lucene Guys, I have a question that is simple but is important for me. I did not found the answer in the javadoc so I am asking here. When adding Document-s by the method IndexWriter.addDocument(doc) does the documents obtain Lucene IDs in the order that they are added to the IndexWriter?

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-19 Thread Michael McCandless
On quick look that code looks fine, though removeField is an expensive operation and unnecessary for this. We really need the full traceback of the exception. Mike Aditi Goyal wrote: Thanks Michael and Ian for your valuable response. I am attaching a small default code. Please have a

Re: Updating tag-indexes

2008-08-19 Thread Michael McCandless
Yes, docIDs are currently sequentially assigned, starting with 0. BUT: on hitting an exception (say in your analyzer) it will usually use up a docID (and then immediately mark it as deleted). Also, this behavior isn't promised in the API, ie it could in theory (though I think it unlikely)

RE: Case Sensitivity

2008-08-19 Thread Dino Korah
Hi Guys, From the discussion here what I could understand was, if I am using StandardAnalyzer on TOKENIZED fields, for both Indexing and Querying, I shouldn't have any problems with cases. But if I have any UN_TOKENIZED fields there will be problems if I do not case-normalize them myself before

How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Сергій Карпенко
Hello For example, we have a text: Hello w*orld  it's indexed as NO_NORMS, so this phrase is term. And I have a code: Query query = new WildcardQuery(new Term(field, Hello w*orld)); its work But I need symbol '*' as ordinary symbol, not escape

Re: Multiple index performance

2008-08-19 Thread Erick Erickson
Another issue is opening/closing your indexes. When you open an index for searching, the first few queries you fire invoke considerable overhead as caches warm up, etc. Plus, you don't get any efficiencies of scale (that is, pretty soon adding 2X the amount of text to an index increases the size

Re: Simple Query Question

2008-08-19 Thread Erick Erickson
As Ian says, but you can set the default to AND or OR, see the API docs. The 'out of the box' default is OR. See QueryParser.setDefaultOperator Best Erick On Tue, Aug 19, 2008 at 4:30 AM, Ian Lea [EMAIL PROTECTED] wrote: No, lucene does not automatically replace spaces with AND. See

Re: Updating tag-indexes

2008-08-19 Thread Erick Erickson
I'd add to Michael's mail the *strong* recommendation that you provide your own unique doc IDs and use *those* instead. It'll save you a world of grief. Whenever you need to add a new doc to an existing index, you can get the maximum of *your* unique IDs and increment it yourself. One thing to

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Erick Erickson
Before going down this path I'd really recommend you get a copy of Luke and look at your index. Depending upon the analyzer you're using, you may or may not have w*orld indexed. You may have the tokens: w orld with the * dropped completely. As far as I know, NO_NORMS has nothing to do with

RE: windows file system cache

2008-08-19 Thread Robert Stewart
Thank you for the help. It seems that just changing memory usage setting to programs from default of system cache fixed the issue. Now it takes only about 4 GB of system cache instead of 26 GB, and search performance is back to normal (fast). -Original Message- From: Mark Miller

Re: Multiple index performance

2008-08-19 Thread Cyndy
Thanks Anthony, I understand your comment, and I think it makes sense, the only thing is that I have the issue that I need to guarantee privacy to the users, so if I am able to read the indexes (if they are not encrypted), then I can pretty much know what he says in the document, so that is why

Re[2]: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Сергій Карпенко
Yes, you are correct - NO_NORMS has nothing to do with tokenization, thats mean no analyzers used. String fall's in index as single term. But, what about our wildcard symbols? Re: How I can find wildcard symbol with WildcardQuery? Before going down this path I'd really

Re: Updating tag-indexes

2008-08-19 Thread Ivan Vasilev
: [EMAIL PROTECTED] __ NOD32 3366 (20080819) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-19 Thread Cedric Ho
Hi, Thanks for the reply =) What aspect of performance do you find lacking? Is it searching or indexing? While we've had stellar results for searches, indexing is just so-so better than conventional harddisks. Search response time. We used the search log from our production system and test

RE: Case Sensitivity

2008-08-19 Thread Steven A Rowe
Hi Dino, I think you'd benefit from reading some FAQ answers, like: Why is it important to use the same analyzer type during indexing and search? http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c44472d10961ba63c Also, have a look at the AnalysisParalysis wiki page for

Re: search for special condition.

2008-08-19 Thread Mr Shore
감사합니다:) 2008/8/18 장용석 [EMAIL PROTECTED] Hi. Yes, that method is in lucene. I'm sorry about I did misunderstand your words. I hope that you will find the way for you want. bye.:) 2008/8/16, Mr Shore [EMAIL PROTECTED]: thanks,Jang but I didn't find the method isTokenChar maybe

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-19 Thread eks dev
hi Cedric, has nothing to do with SSD... but All queries involves a Date Range Filter and a Publication Filter. We've used WrappingCachingFilters for the Publication Filter for there are only a limited number of combinations for this filter. For the Date Range Filter we just let it run

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Daniel Noll
Сергій Карпенко wrote: Yes, you are correct - NO_NORMS has nothing to do with tokenization, thats mean no analyzers used. Just to avoid this ambiguous, semi-contradicting wording confusing the hell out of anyone... NO_NORMS *does* have something to do with tokenisation -- it implies

Re: Are there any Lucene optimizations applicable to SSD?

2008-08-19 Thread Cedric Ho
Hi eks, My index is fully optimized, but I wasn't aware that I can sort it by fields in Lucene. Could you elaborate on how to do that? By omitTf(), do you mean Fieldable.setOmitNorms(true)? I'll try that. Thanks, Cedric Ho if you have possibility to sort your index once in a while on

RE: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Kwon, Ohsang
Why do you use to WildcardQuery? You are not need to whildcard. (maybe..) Use term query. Term term = new Term(field, Hello w*orld); Query query1 = new TermQuery(term); gimme post -Original Message- From: Сергій Карпенко [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 19, 2008 10:20 PM

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Daniel Noll
Kwon, Ohsang wrote: Why do you use to WildcardQuery? You are not need to whildcard. (maybe..) Use term query. What if you need to match a literal wildcard *and* an actual wildcard. :-) Daniel -- Daniel Noll - To

Re: How I can find wildcard symbol with WildcardQuery?

2008-08-19 Thread Daniel Noll
I wrote: What if you need to match a literal wildcard *and* an actual wildcard. :-) Actually this was a rhetorical question, but there is at least one answer: use a regex query instead. Regexes do support escaping the special symbols, so this problem doesn't exist for those. Daniel --