What is the best file system for Lucene?

2004-11-30 Thread Sanyi
Hi! I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP Pro,

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
Interesting, what are your merge settings Sorry, I didn't mention that I was talking about search performance. I'm using the same, fully optimized index on both systems. (I've generated both indexes with the same code from the same database on the actual OS) which JDK are you using? I'm

Re: What is the best file system for Lucene?

2004-11-30 Thread sg
What file systems are you people using Lucene on? And what are your experiences? http://www.apple.com/xsan/ Actually it is a beta version and have some small issues but it is very fast and easy to manage in case you get it installed. The installation it self is tricky since it is very

SEARCH CRITERIA

2004-11-30 Thread Karthik N S
Hi Guys Apologies. On yahoo and Altavista ,if searched upon a word like 'kid' returns the search with similar as below. Also try: kid rock, kid games, star wars kid, karate kid More... How to obtain the similar search criteria using Lucene. Thx in advance Warm regards

Re: SEARCH CRITERIA

2004-11-30 Thread Nader Henein
they probably create a list of similar results by doing some sort of data mining on the search criteria that people use in succession, so for example someone, or they have a list of searches that are too general (a search for the word kid is at best stupid) but you can't call your users stupid

GETVALUES +SEARCH

2004-11-30 Thread Karthik N S
Hi Guys Apologies. On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? Please Explaine with example. Thx in advance WITH WARM

Re: What is the best file system for Lucene?

2004-11-30 Thread Pete Lewis
Hi Sanyi Could you try XP on your desktop - that would take some variables out. The problem is that you are comparing OS, as well as filesystems, as well as different hardware configs. Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the

Re: What is the best file system for Lucene?

2004-11-30 Thread John Haxby
Sanyi wrote: I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP

Re: What is the best file system for Lucene?

2004-11-30 Thread Justin Swanhart
On Tue, 30 Nov 2004 12:07:46 -, Pete Lewis [EMAIL PROTECTED] wrote: Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the CPU - so your desktop is actually using a 1.5GHz CPU for the search. So, taking account of this its not too

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
Could you try XP on your desktop Sure, but I'll only do that I run out of ideas. so your desktop is actually using a 1.5GHz CPU for the search. No, this is not true. It uses a 3.0GHz P4 then. (HT means that you have two 3.0GHz P4s) So, it is still surprising to me. Regards, Sanyi

Re: AW: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
The notebook is quite good, e.g. the Pentium-M might be faster than your Pentium 4. At least it has a similar speed, because of it better internal design. Never compare cpus of different types by their frequency. Ok, this might be true, but: All of my other tests where the CPU is involved,

Re: What is the best file system for Lucene?

2004-11-30 Thread Justin Swanhart
As a generalisation, SuSE itself is not a lot slower than Windows XP. I also very much doubt that filesystem is a factor. If you want to test w/out filesystem involvement, simply load your index into a RAMDirectory instead of using FSDirectory. That precludes filesystem overhead in searches.

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
How large is the index? If it's less than a couple of GByte then it will be entirely in memory It is 3GBytes big and it will grow a lot. I have to search from the HDD which is very fast compared to the notebook's HDD. Average seek time: Notebook: 8-9ms Desktop: 3.9ms Data read: Notebook:

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
simply load your index into a RAMDirectory instead of using FSDirectory. I have 3GByte RAM and my index is 3GByte big currently. (it'll be soon about 4GByte) So, I have to find out this another way. First off, 1.8GHz Pentium-M machines are supposed to run at about the speed of a 2.4GHz

similarity matrix - more clear

2004-11-30 Thread Roxana Angheluta
Dear all, Yesterday I've asked a question about geting the similarity matrix of a collection of documents from an index, but I got only one answer, so perhaps my question was not very clear. I will try to reformulate: I want to use Lucene to have efficient access to an index of a collection of

Re: GETVALUES +SEARCH

2004-11-30 Thread Erik Hatcher
On Nov 30, 2004, at 7:10 AM, Karthik N S wrote: On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? getValues(fieldName) returns a String[] of the values of

Re: similarity matrix - more clear

2004-11-30 Thread Xiangyu Jin
I also have the same task as you do. According to my understanding, suppose their are N documents, your approach will take N^2 similarity calculations. Although there are N(N-1)/2 distinct document pairs, the similarity calculation (according to my understanding) in Lucene is asymmetric, so

RE: What is the best file system for Lucene?

2004-11-30 Thread Armbrust, Daniel C.
You may want to give the IBM JVM a try - I've found it faster in some cases... http://www-106.ibm.com/developerworks/java/jdk/linux140/ Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

RE: What is the best file system for Lucene?

2004-11-30 Thread Armbrust, Daniel C.
As I understand hyperthreading, this is not true: Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the CPU - so your desktop is actually using a 1.5GHz CPU for the search. You still have the full speed of the processor available - the

Lucene's ranking function VS Standard VSM model

2004-11-30 Thread Xiangyu Jin
I have seen different versions of Lucene's ranking function from the similarity document and Lucene user list. Since I need to get document-doucment similaries, so what I do is to issue the document as query directly. I found it is different if we issue computer computer to Lucene vers we issue

Re: What is the best file system for Lucene?

2004-11-30 Thread Otis Gospodnetic
Hello, Lucene indexing completes in 13-15 hours on the desktop system while it completes in about 29-33 hours on the notebook. Now, combine it with the DROP INDEX tests completing in the same amount of time on both and find out why is the search only slightly faster :) Until then, all

Re: similarity matrix - more clear

2004-11-30 Thread Otis Gospodnetic
Hello, I don't think Lucene can spit out the similarity matrix for you, but perhaps you can use Lucene's Term Vector support to help you build the matrix yourself: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/TermFreqVector.html The other relevant sections of the Lucene API

Does Lucene perform ranking in the retrieved set?

2004-11-30 Thread Xiangyu Jin
THis might be a stupid question. When perform retrieval for a query, deos Lucene first get a subset of candidate matches and then perform the ranking on the set? That is, similarity calculation is performed only on a subset of the docuemnts to the query. If so, from which module could I get

Re: Does QueryParser uses Analyzer ?

2004-11-30 Thread Otis Gospodnetic
QueryParser does use Analyzer, see this: static public Query parse(String query, String field, Analyzer analyzer) throws ParseException { QueryParser parser = new QueryParser(field, analyzer); return parser.parse(query); } Otis P.S. Use lucene-user list, please. --- Ricardo

Re: Does Lucene perform ranking in the retrieved set?

2004-11-30 Thread Paul Elschot
On Tuesday 30 November 2004 18:46, Xiangyu Jin wrote: THis might be a stupid question. When perform retrieval for a query, deos Lucene first get a subset of candidate matches and then perform the ranking on the set? That is, similarity calculation is performed only on a subset of the

Re: What is the best file system for Lucene?

2004-11-30 Thread Sanyi
Thanx for the replies to you all. I was looking for someone with the same experiences as mine ones, but it seems that I'll have to test this myself. I'll try out my ideas and the most interesting ideas from you guys. Regards, Sanyi __ Do you

literal search in quotes on non-tokenized field

2004-11-30 Thread Allen Atamer
Here is a problem I am experiencing with Lucene searches on non-tokenized fields: A search in quotes on a field named Build with the query \orig\ does not work but the query origi yields 62 hits I have run indexing on the field with the following method

Re: literal search in quotes on non-tokenized field

2004-11-30 Thread Erik Hatcher
On Nov 30, 2004, at 4:42 PM, Allen Atamer wrote: A search in quotes on a field named Build with the query \orig\ does not work but the query origi yields 62 hits I have run indexing on the field with the following method doc.add(Field.Keyword(data.getColumnName(j),

Re: similarity matrix - more clear

2004-11-30 Thread Chris Hostetter
: A possible solution would be to initialize in turn each document as a : query, do a search using an IndexSearcher and to take from the search : result the similarity between the query (which is in fact a document) : and all the other documents. This is highly redundant, because the : similarity

RE: literal search in quotes on non-tokenized field

2004-11-30 Thread Allen Atamer
Erik, -Original Message- Here's a log of the parsed query before going to the searcher: Parsed query: (Build:origi) for the first search Parsed query: (Build:origi) for the second search What do you mean by parsed, since below you say you're not using QueryParser/Analyzer.

RE: GETVALUES +SEARCH

2004-11-30 Thread Karthik N S
Hi Guys Apologies... Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i