Hi!
I'm testing Lucene 1.4.2 on two very different configs, but with the same index.
I'm very surprised by the results: Both systems are searching at about the same
speed, but I'd
expect (and I really need) to run Lucene a lot faster on my stronger config.
Config #1 (a notebook):
WinXP Pro,
Interesting, what are your merge settings
Sorry, I didn't mention that I was talking about search performance.
I'm using the same, fully optimized index on both systems.
(I've generated both indexes with the same code from the same database on the
actual OS)
which JDK are you using?
I'm
What file systems are you people using Lucene on? And what are your
experiences?
http://www.apple.com/xsan/
Actually it is a beta version and have some small issues but it is very fast
and easy to manage in case you get it installed.
The installation it self is tricky since it is very
Hi Guys
Apologies.
On yahoo and Altavista ,if searched upon a word like 'kid' returns the
search with
similar as below.
Also try: kid rock, kid games, star wars kid, karate kid More...
How to obtain the similar search criteria using Lucene.
Thx in advance
Warm regards
they probably create a list of similar results by doing some sort of
data mining on the search criteria that people use in succession, so for
example someone, or they have a list of searches that are too general (a
search for the word kid is at best stupid) but you can't call your users
stupid
Hi Guys
Apologies.
On Search API the command [ package org.apache.lucene.document.Document ]
Will this'public final String[] getValues(String name)' return me
all the docs with out looping thru ?
Please Explaine with example.
Thx in advance
WITH WARM
Hi Sanyi
Could you try XP on your desktop - that would take some variables out. The
problem is that you are comparing OS, as well as filesystems, as well as
different hardware configs.
Also, unless you take your hyperthreading off, with just one index you are
searching with just one half of the
Sanyi wrote:
I'm testing Lucene 1.4.2 on two very different configs, but with the same index.
I'm very surprised by the results: Both systems are searching at about the same
speed, but I'd
expect (and I really need) to run Lucene a lot faster on my stronger config.
Config #1 (a notebook):
WinXP
On Tue, 30 Nov 2004 12:07:46 -, Pete Lewis [EMAIL PROTECTED] wrote:
Also, unless you take your hyperthreading off, with just one index you are
searching with just one half of the CPU - so your desktop is actually using
a 1.5GHz CPU for the search. So, taking account of this its not too
Could you try XP on your desktop
Sure, but I'll only do that I run out of ideas.
so your desktop is actually using
a 1.5GHz CPU for the search.
No, this is not true. It uses a 3.0GHz P4 then.
(HT means that you have two 3.0GHz P4s)
So, it is still surprising to me.
Regards,
Sanyi
The notebook is quite good, e.g. the Pentium-M might be faster than
your Pentium 4. At least it has a similar speed, because of it better
internal design. Never compare cpus of different types by their
frequency.
Ok, this might be true, but:
All of my other tests where the CPU is involved,
As a generalisation, SuSE itself is not a lot slower than Windows XP.
I also very much doubt that filesystem is a factor. If you want to
test w/out filesystem involvement, simply load your index into a
RAMDirectory instead of using FSDirectory. That precludes filesystem
overhead in searches.
How large is the index? If it's less than a couple of GByte then it
will be entirely in memory
It is 3GBytes big and it will grow a lot.
I have to search from the HDD which is very fast compared to the notebook's HDD.
Average seek time:
Notebook: 8-9ms
Desktop: 3.9ms
Data read:
Notebook:
simply load your index into a
RAMDirectory instead of using FSDirectory.
I have 3GByte RAM and my index is 3GByte big currently. (it'll be soon about
4GByte)
So, I have to find out this another way.
First off, 1.8GHz Pentium-M machines are supposed to run at about the
speed of a 2.4GHz
Dear all,
Yesterday I've asked a question about geting the similarity matrix of a
collection of documents from an index, but I got only one answer, so
perhaps my question was not very clear.
I will try to reformulate:
I want to use Lucene to have efficient access to an index of a
collection of
On Nov 30, 2004, at 7:10 AM, Karthik N S wrote:
On Search API the command [ package
org.apache.lucene.document.Document ]
Will this'public final String[] getValues(String name)' return
me
all the docs with out looping thru ?
getValues(fieldName) returns a String[] of the values of
I also have the same task as you do. According to my understanding,
suppose their are N documents, your approach will take N^2 similarity
calculations.
Although there are N(N-1)/2 distinct document pairs,
the similarity calculation (according to my understanding) in Lucene is
asymmetric, so
You may want to give the IBM JVM a try - I've found it faster in some cases...
http://www-106.ibm.com/developerworks/java/jdk/linux140/
Dan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
As I understand hyperthreading, this is not true:
Also, unless you take your hyperthreading off, with just one index you are
searching with just one half of the CPU - so your desktop is actually using
a 1.5GHz CPU for the search.
You still have the full speed of the processor available - the
I have seen different versions of Lucene's ranking function
from the similarity document and Lucene user list.
Since I need to get document-doucment similaries,
so what I do is to issue the document as query directly.
I found it is different if we issue computer computer
to Lucene vers we issue
Hello,
Lucene indexing completes in 13-15 hours on the desktop system while
it completes in about 29-33
hours on the notebook.
Now, combine it with the DROP INDEX tests completing in the same
amount of time on both and find
out why is the search only slightly faster :)
Until then, all
Hello,
I don't think Lucene can spit out the similarity matrix for you, but
perhaps you can use Lucene's Term Vector support to help you build the
matrix yourself:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermFreqVector.html
The other relevant sections of the Lucene API
THis might be a stupid question.
When perform retrieval for a query, deos Lucene first get
a subset of candidate matches and then perform the ranking
on the set? That is, similarity calculation is performed only
on a subset of the docuemnts to the query.
If so, from which module could I get
QueryParser does use Analyzer, see this:
static public Query parse(String query, String field, Analyzer
analyzer)
throws ParseException {
QueryParser parser = new QueryParser(field, analyzer);
return parser.parse(query);
}
Otis
P.S.
Use lucene-user list, please.
--- Ricardo
On Tuesday 30 November 2004 18:46, Xiangyu Jin wrote:
THis might be a stupid question.
When perform retrieval for a query, deos Lucene first get
a subset of candidate matches and then perform the ranking
on the set? That is, similarity calculation is performed only
on a subset of the
Thanx for the replies to you all.
I was looking for someone with the same experiences as mine ones, but it seems
that I'll have to
test this myself.
I'll try out my ideas and the most interesting ideas from you guys.
Regards,
Sanyi
__
Do you
Here is a problem I am experiencing with Lucene searches on non-tokenized
fields:
A search in quotes on a field named Build with the query \orig\ does not
work but the query origi yields 62 hits
I have run indexing on the field with the following method
On Nov 30, 2004, at 4:42 PM, Allen Atamer wrote:
A search in quotes on a field named Build with the query \orig\
does not
work but the query origi yields 62 hits
I have run indexing on the field with the following method
doc.add(Field.Keyword(data.getColumnName(j),
: A possible solution would be to initialize in turn each document as a
: query, do a search using an IndexSearcher and to take from the search
: result the similarity between the query (which is in fact a document)
: and all the other documents. This is highly redundant, because the
: similarity
Erik,
-Original Message-
Here's a log of the parsed query before going to the searcher:
Parsed query: (Build:origi) for the first search
Parsed query: (Build:origi) for the second search
What do you mean by parsed, since below you say you're not using
QueryParser/Analyzer.
Hi Guys
Apologies...
Is there any API in Lucene Which can retrieve all the searched Values in
single fetch
into some sort of an 'Array' WITHOUT using this [ below ] Looping
process [ This would make
the Search and display more Faster ].
for (int i = 0; i
31 matches
Mail list logo