Hi all,
We are testing Lucene with SSD. No doubt the performance is much
better than that of a normal hard disk.
However it's still not good enough for our particular case. So I
wonder if there are any tips for optimizing lucene performance on
SSDs.
For example, I saw that Lucene's
No, lucene does not automatically replace spaces with AND.
See http://lucene.apache.org/java/2_3_2/queryparsersyntax.html
--
Ian.
On Tue, Aug 19, 2008 at 1:34 AM, DanaWhite [EMAIL PROTECTED] wrote:
For some reason I am thinking I read somewhere that if you queried something
like:
Eiffel
Hi All,
I am using IndexWriter for adding the documents. I am re-using the document
as well as the fields for improving index speed as per the link
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
So, for each doc, i am first removing field using doc.removeField() and then
On Tue, 2008-08-19 at 16:22 +0800, Cedric Ho wrote:
[Lucene on SSD]
However it's still not good enough for our particular case. So I
wonder if there are any tips for optimizing lucene performance on
SSDs.
What aspect of performance do you find lacking? Is it searching or
indexing? While
Hi
I don't think you need to remove the field and then add it again, but
I've no idea if that is relevant to your problem or not.
A full stack trace would be more help, and maybe an upgrade to 2.3.2,
and maybe a snippet of your code, and what is JCC?
--
Ian.
On Tue, Aug 19, 2008 at 10:09
Ian Lea wrote:
I don't think you need to remove the field and then add it again, but
I've no idea if that is relevant to your problem or not.
That's right: just leave the Field there and change its value
(assuming the doc you are changing to still uses that field).
A full stack trace
Thanks Michael and Ian for your valuable response.
I am attaching a small default code. Please have a look and tell me where am
I going wrong.
import lucene
from lucene import Document, Field, initVM, CLASSPATH
doc = Document()
fieldA = Field('fieldA', , Field.Store.YES,
Hi Lucene Guys,
I have a question that is simple but is important for me. I did not
found the answer in the javadoc so I am asking here.
When adding Document-s by the method IndexWriter.addDocument(doc) does
the documents obtain Lucene IDs in the order that they are added to the
IndexWriter?
On quick look that code looks fine, though removeField is an expensive
operation and unnecessary for this.
We really need the full traceback of the exception.
Mike
Aditi Goyal wrote:
Thanks Michael and Ian for your valuable response.
I am attaching a small default code. Please have a
Yes, docIDs are currently sequentially assigned, starting with 0.
BUT: on hitting an exception (say in your analyzer) it will usually
use up a docID (and then immediately mark it as deleted).
Also, this behavior isn't promised in the API, ie it could in theory
(though I think it unlikely)
Hi Guys,
From the discussion here what I could understand was, if I am using
StandardAnalyzer on TOKENIZED fields, for both Indexing and Querying, I
shouldn't have any problems with cases. But if I have any UN_TOKENIZED
fields there will be problems if I do not case-normalize them myself before
Hello
For example, we have a text:
Hello w*orld
it's indexed as NO_NORMS, so this phrase is term.
And I have a code:
Query query = new WildcardQuery(new Term(field, Hello w*orld)); its work
But I need symbol '*' as ordinary symbol, not escape
Another issue is opening/closing your indexes. When you open an
index for searching, the first few queries you fire invoke considerable
overhead as caches warm up, etc. Plus, you don't get any efficiencies
of scale (that is, pretty soon adding 2X the amount of text to an index
increases the size
As Ian says, but you can set the default to AND or OR, see
the API docs.
The 'out of the box' default is OR.
See QueryParser.setDefaultOperator
Best
Erick
On Tue, Aug 19, 2008 at 4:30 AM, Ian Lea [EMAIL PROTECTED] wrote:
No, lucene does not automatically replace spaces with AND.
See
I'd add to Michael's mail the *strong* recommendation that you provide
your own unique doc IDs and use *those* instead. It'll save you a world
of grief. Whenever you need to add a new doc to an existing index, you
can get the maximum of *your* unique IDs and increment it yourself.
One thing to
Before going down this path I'd really recommend you get a copy of Luke
and look at your index. Depending upon the analyzer you're using, you
may or may not have w*orld indexed. You may have the tokens:
w
orld
with the * dropped completely.
As far as I know, NO_NORMS has nothing to do with
Thank you for the help. It seems that just changing memory usage setting to
programs from default of system cache fixed the issue. Now it takes only
about 4 GB of system cache instead of 26 GB, and search performance is back to
normal (fast).
-Original Message-
From: Mark Miller
Thanks Anthony,
I understand your comment, and I think it makes sense, the only thing is
that I have the issue that I need to guarantee privacy to the users, so if I
am able to read the indexes (if they are not encrypted), then I can pretty
much know what he says in the document, so that is why
Yes, you are correct - NO_NORMS has nothing to do with tokenization, thats mean
no analyzers used. String fall's in index as single term.
But, what about our wildcard symbols?
Re: How I can find wildcard symbol with WildcardQuery?
Before going down this path I'd really
: [EMAIL PROTECTED]
__ NOD32 3366 (20080819) Information __
This message was checked by NOD32 antivirus system.
http://www.eset.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail
Hi,
Thanks for the reply =)
What aspect of performance do you find lacking? Is it searching or
indexing? While we've had stellar results for searches, indexing is just
so-so better than conventional harddisks.
Search response time. We used the search log from our production
system and test
Hi Dino,
I think you'd benefit from reading some FAQ answers, like:
Why is it important to use the same analyzer type during indexing and search?
http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c44472d10961ba63c
Also, have a look at the AnalysisParalysis wiki page for
감사합니다:)
2008/8/18 장용석 [EMAIL PROTECTED]
Hi.
Yes, that method is in lucene.
I'm sorry about I did misunderstand your words.
I hope that you will find the way for you want.
bye.:)
2008/8/16, Mr Shore [EMAIL PROTECTED]:
thanks,Jang
but I didn't find the method isTokenChar
maybe
hi Cedric,
has nothing to do with SSD... but
All queries involves a Date Range Filter and a Publication Filter.
We've used WrappingCachingFilters for the Publication Filter for there
are only a limited number of combinations for this filter. For the
Date Range Filter we just let it run
Сергій Карпенко wrote:
Yes, you are correct - NO_NORMS has nothing to do with tokenization,
thats mean no analyzers used.
Just to avoid this ambiguous, semi-contradicting wording confusing the
hell out of anyone...
NO_NORMS *does* have something to do with tokenisation -- it implies
Hi eks,
My index is fully optimized, but I wasn't aware that I can sort it by
fields in Lucene. Could you elaborate on how to do that?
By omitTf(), do you mean Fieldable.setOmitNorms(true)? I'll try that.
Thanks,
Cedric Ho
if you have possibility to sort your index once in a while on
Why do you use to WildcardQuery? You are not need to whildcard. (maybe..)
Use term query.
Term term = new Term(field, Hello w*orld);
Query query1 = new TermQuery(term);
gimme post
-Original Message-
From: Сергій Карпенко [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 19, 2008 10:20 PM
Kwon, Ohsang wrote:
Why do you use to WildcardQuery? You are not need to whildcard. (maybe..)
Use term query.
What if you need to match a literal wildcard *and* an actual wildcard. :-)
Daniel
--
Daniel Noll
-
To
I wrote:
What if you need to match a literal wildcard *and* an actual wildcard. :-)
Actually this was a rhetorical question, but there is at least one
answer: use a regex query instead. Regexes do support escaping the
special symbols, so this problem doesn't exist for those.
Daniel
--
29 matches
Mail list logo