cost of proximity question

2004-11-15 Thread Anson Lau
Hi all,

Does anyone know what's the performance cost of a Nutch like proximity query
that looks like this: (+Hello +World +\Hello world\~p^a)x  ?  or just how
in general how much processing does proximity add to a query?

Thanks,
Anson




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene vs. MySQL Full-Text

2004-07-21 Thread Anson Lau
Depending on what MySQL Full-text search support you probably will lose some
of the advance things you get for free from Lucene, such as proximity
search, wildcard search, search term and search field boosting, scoring of
the documents, etc.

Afterall it depends on what you need to do.  In our dev team we are actually
currently having a mini debate over whether to use lucene for our project or
write something from scratch that's based on a DB.

We need really good performance. I feel lucene can do our job very well,
some of our guys feel using a DB based search can give us greater
performance on the type of search we do.


Anson

-Original Message-
From: Florian Sauvin [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 8:55 AM
To: Lucene Users List
Subject: Re: Lucene vs. MySQL Full-Text


On Jul 20, 2004, at 12:29 PM, Tim Brennan wrote:

 Someone came into my office today and asked me about the project I am
 trying to Lucene for -- why aren't you just using a MySQL full-text
 index to do that -- after thinking about it for a few minutes, I
 realized I don't have a great answer.

 MySQL builds inverted indexes for (in theory) doing the same type of
 lookup that lucene does.  You'd maybe have to build some kind of a 
 layer
 on the front to mimic Lucene's analyzers, but that wouldn't be too
 hard

 My only experience with MySQLfulltext is trivial test apps -- but the
 MySQL world does have some significant advantages (its a known quantity
 from an operations perspective, etc).  Does anyone out there have
 anything more concrete they can add?

 --tim



I'd say that MySQL full text is much slower if you have a lot of
data... that is one of the reasons we started using lucene (We had a
mysql db to do the search), it's way faster!


--

Florian


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: speeding up lucene search

2004-07-21 Thread Anson Lau
Has anyone tried splitting up an index into smaller chunks, without putting
the different indicies on a different physical disk/box?  What sort of
performance gain do you get from it?
 
Anson


-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 7:43 PM
To: Lucene Users List
Subject: Re: speeding up lucene search

In general, yes.
By splitting up a large index into smaller indicies, you are
linearizing the search time.
Furthermore, that allows you to make your search distributable.

-John

On Wed, 21 Jul 2004 13:00:28 +1000, Anson Lau [EMAIL PROTECTED] wrote:
 Hello guys,
 
 What are some general techniques to make lucene search faster?
 
 I'm thinking about splitting up the index.  My current index has approx
1.8
 million documents (small documents) and index size is about 550MB.  Am I
 likely to get much gain out of splitting it up and use a
 multiparallelsearcher?
 
 Most of my search queries search queries search on 5-10 fields.
 
 Are there other things I should look at?
 
 Thanks to all,
 Anson
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Weighting database fields

2004-07-21 Thread Anson Lau
Erik,

Is there any benefit to set the boost during indexing rather than set it
during query?

I usually set it when doing a query because you can change that boost values
easily without having to re-index.

Thanks,
ANson


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 22, 2004 12:52 AM
To: Lucene Users List
Subject: Re: Weighting database fields

On Jul 21, 2004, at 10:09 AM, Anson Lau wrote:
 Apply boost factor to fields when you do a lucene search.

Or... set the boost on the Field during indexing.

Erik



 Anson

 -Original Message-
 From: John Patterson [mailto:[EMAIL PROTECTED]
 Sent: Thursday, July 22, 2004 12:07 AM
 To: [EMAIL PROTECTED]
 Subject: Weighting database fields

 Hi,

 What is the best way to get Lucene to assign weightings to certain 
 fields
 from a database?  For example, the 'name' field should be weighted 
 higher
 than the 'description' field.

 Thanks,

 John.

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



speeding up lucene search

2004-07-20 Thread Anson Lau
Hello guys,

What are some general techniques to make lucene search faster?

I'm thinking about splitting up the index.  My current index has approx 1.8
million documents (small documents) and index size is about 550MB.  Am I
likely to get much gain out of splitting it up and use a
multiparallelsearcher?

Most of my search queries search queries search on 5-10 fields.

Are there other things I should look at?

Thanks to all,
Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Scoring without normalization!

2004-07-14 Thread Anson Lau
If you don't mind hacking the source:

In Hits.java

In method getMoreDocs()



// Comment out the following
//float scoreNorm = 1.0f;
//if (length  0  scoreDocs[0].score  1.0f) {
//  scoreNorm = 1.0f / scoreDocs[0].score;
//}

// And just set scoreNorm to 1.
int scoreNorm = 1;


I don't know if u can do it without going to the src.

Anson


-Original Message-
From: Jones G [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 15, 2004 6:52 AM
To: [EMAIL PROTECTED]
Subject: Scoring without normalization!

How do I remove document normalization from scoring in Lucene? I just want
to stick to TF IDF.

Thanks.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Pool of IndexReaders or Pool of Searchers?

2004-07-11 Thread Anson Lau
Hi,

When I did some load testing on a lucene powered search app, using a
pool of index searchers doesn't give me any more search per second
than just using a singleton index searcher.

Anson


Quoting [EMAIL PROTECTED]:

 Hi,
 
 I have multiple threads reading an index.  Should they all be
 using
 the same IndexReader and using a pool of IndexSearchers?  Or
 should they be
 using a pool of IndexReaders?
 
 Basically, one reader or many?
 
 Thanks.
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: best ways of using IndexSearcher

2004-06-29 Thread Anson Lau
Otis,

Thanks for the advice.  When you say This stuff is not really CPU
intensive are you refering to the search itself or something else?  In my
experience the search tends to be ultimately bounded by CPU.

Anson

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 29, 2004 2:51 PM
To: Lucene Users List
Subject: Re: best ways of using IndexSearcher

Anson,

Use a single instance of IndexSearcher and, if you want to always 'see'
even the latest index changes (deletes and adds since you opened the
IndexSearcher) make sure to re-create the IndexSearcher when you detect
that the index version has changed (see
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReade
r.html#getCurrentVersion(org.apache.lucene.store.Directory))

When you get the new IndexSearcher, leave the old instance alone - let
the GC take care of it, and don't call close() on it, in case something
in your application is still using that instance.

This stuff is not really CPU intensive.  Disk I/O tends to be the
bottleneck.  If you are working with multiple indices, spread them over
multiple disks (not just partitions, real disks), if you can.

Otis


--- Anson Lau [EMAIL PROTECTED] wrote:
 Hi Guys,
 
 What's the recommended way of using IndexSearcher? Should
 IndexSearcher be a
 singleton or pooled?  Would pooling provide a more scalable solution
 by
 allowing you to decide how many IndexSearcher to use based on say how
 many
 CPU u have on ur server?
 
 Thanks,
 
 Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



best ways of using IndexSearcher

2004-06-28 Thread Anson Lau
Hi Guys,

What's the recommended way of using IndexSearcher? Should IndexSearcher be a
singleton or pooled?  Would pooling provide a more scalable solution by
allowing you to decide how many IndexSearcher to use based on say how many
CPU u have on ur server?

Thanks,

Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: using boost factor

2004-06-23 Thread Anson Lau
Hi guys,

It seems like to really customise the scoring in lucene, one will have to go
into the lucene source.

I spend a fair bit of time looking into this and it seems to me not the full
scoring api is exported.  The formula documented on the Similarity class
seems to explain how a term is scored, but not, for example, how the final
score on a Boolean query is computed from each individual component. (Please
correct me if I'm wrong).  Normalisation is another part where the API is
not exported.

Anson

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 23, 2004 3:51 AM
To: Lucene Users List
Subject: Re: using boost factor

Hello Anson,

I would look at IndexSearcher's explain method:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSear
cher.html#explain(org.apache.lucene.search.Query,%20int)

This should give you insight into what's contributing to the high/low
scores, thus telling you what you can tweak.  Perhaps it's just the
boost, perhaps some other similarity factors.

Using explain should provide you information such as this, for example:
http://www.mozdex.com/explain.jsp?idx=2id=2067257query=goober

I hope this helps.  Somebody else will probably be able to give more
information, but this should get you started while you wait.

Otis

--- Anson Lau [EMAIL PROTECTED] wrote:
 Hi guys,
 
 Lets say I want to search the term hello world over 3 fields with
 different boost:
 
 ((hello:field1 world:field1)^0.001 (hello:field2 world:field2)^100
 (hello:field3 world:field3)^2))
 
 Note I've given field1 a really low boost, a heavy boost to field2
 and a
 REALLY heavy boost to field3.
 
 What is happening to me is that a term that matches both field1 and
 field2,
 will have a higher score than a term that matches field3 only, even
 though
 field3's boost is WAY higher.
 
 Can I change this behaviour such that the match in field3 only will
 actually
 have a higher score because of the boost?
 
 Thanks,
 
 Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: a list of matching search term

2004-06-02 Thread Anson Lau
Thanks Erik I'll give that a try.

Anson

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 02, 2004 7:28 PM
To: Lucene Users List
Subject: Re: a list of matching search term

On Jun 1, 2004, at 9:19 PM, Anson Lau wrote:
 Further to my previous email: The highlighter package should be able
 to pick
 up the matching search terms.  Can some experienced highlighter package
 users tell me if I should look down that line?

Yes, Highlighter (available in the sandbox) picks out matching terms.
If you used a custom Formatter with Highlighter, you could pick out
matching terms and have a list of them.  This would not be something
you do for every hit, though, as it would take a little time to do for
each document.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



a list of matching search term

2004-06-01 Thread Anson Lau
Hi All,

Eg. Lets say someone do a search on the terms 'apple orange banana'.

In the search results, is it possible to find out for each hit, which of
those terms did match?

Ie. The document with the highest score has all three words so the matching
terms are all of those words.

A lesser document may only have 'apple' and 'orange' inside it.  So the
matching terms will be 'apple' and 'orange'.


Thanks,

Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: a list of matching search term

2004-06-01 Thread Anson Lau
Further to my previous email: The highlighter package should be able to pick
up the matching search terms.  Can some experienced highlighter package
users tell me if I should look down that line?

Thanks a lot.

Anson

-Original Message-
From: Anson Lau [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 01, 2004 5:20 PM
To: 'Lucene Users List'
Subject: a list of matching search term

Hi All,

Eg. Lets say someone do a search on the terms 'apple orange banana'.

In the search results, is it possible to find out for each hit, which of
those terms did match?

Ie. The document with the highest score has all three words so the matching
terms are all of those words.

A lesser document may only have 'apple' and 'orange' inside it.  So the
matching terms will be 'apple' and 'orange'.


Thanks,

Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



field boost factor

2004-05-14 Thread Anson Lau
Hi all,

Is it possible to set different boost factor to different fields when you do
a search, rather than when you index?

Thanks,

Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: field boost factor

2004-05-14 Thread Anson Lau
I think I found it in Query API...

Thanks,

Anson

-Original Message-
From: Anson Lau [mailto:[EMAIL PROTECTED]
Sent: Friday, May 14, 2004 4:27 PM
To: [EMAIL PROTECTED]
Subject: field boost factor

Hi all,

Is it possible to set different boost factor to different fields when you do
a search, rather than when you index?

Thanks,

Anson


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



looking for developer

2004-03-28 Thread Anson Lau
Hi All,

Our company is looking for 2 java developer with strong Lucene
experience to do some contract work.  We're in Sydney, Australia. 
If anyone is interested plesaes email me direct
([EMAIL PROTECTED]).

Thanks,

Anson

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: looking for developer

2004-03-28 Thread Anson Lau
Esmond,

Thanks a lot for your email.  Will certainly consider that - do you know
another 1-2 person who also knows Lucene very well, who you can work
with in case we need another person?  No point having 1 developer in
Sydney and one in Melbourne.

To give you a bit more background - we'll be indexing and searching a
database of approx 1.5 million records.  Do you have experience with
this sort of scale?  I don't and I am specifically looking for people
with that.

Thanks,

Anson


-Original Message-
From: Esmond Pitt [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 29, 2004 12:41 PM
To: Lucene Users List
Subject: Re: looking for developer

Anson

I have very strong Lucene experience, covering 1.2, 1.3, and the current
1.4-almost-an-RC, having built search engines for several web sites with
it.

I'm located in Melbourne  not very able to change that, but if you
don't
get other bites maybe we can talk about telecommuting  the occasional
day
trip?

Esmond Pitt FACS
0400 139 869
- Original Message - 
From: Anson Lau [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, March 29, 2004 12:23 AM
Subject: looking for developer


 Hi All,

 Our company is looking for 2 java developer with strong Lucene
 experience to do some contract work.  We're in Sydney, Australia.
 If anyone is interested plesaes email me direct
 ([EMAIL PROTECTED]).

 Thanks,

 Anson

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: looking for developer

2004-03-28 Thread Anson Lau
Ops - sorry should reply to Esmond direct.  Pls ignore the previous msg.

Anson



-Original Message-
From: Anson Lau [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 29, 2004 1:29 PM
To: 'Lucene Users List'
Subject: RE: looking for developer

Esmond,

Thanks a lot for your email.  Will certainly consider that - do you know
another 1-2 person who also knows Lucene very well, who you can work
with in case we need another person?  No point having 1 developer in
Sydney and one in Melbourne.

To give you a bit more background - we'll be indexing and searching a
database of approx 1.5 million records.  Do you have experience with
this sort of scale?  I don't and I am specifically looking for people
with that.

Thanks,

Anson


-Original Message-
From: Esmond Pitt [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 29, 2004 12:41 PM
To: Lucene Users List
Subject: Re: looking for developer

Anson

I have very strong Lucene experience, covering 1.2, 1.3, and the current
1.4-almost-an-RC, having built search engines for several web sites with
it.

I'm located in Melbourne  not very able to change that, but if you
don't
get other bites maybe we can talk about telecommuting  the occasional
day
trip?

Esmond Pitt FACS
0400 139 869
- Original Message - 
From: Anson Lau [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, March 29, 2004 12:23 AM
Subject: looking for developer


 Hi All,

 Our company is looking for 2 java developer with strong Lucene
 experience to do some contract work.  We're in Sydney, Australia.
 If anyone is interested plesaes email me direct
 ([EMAIL PROTECTED]).

 Thanks,

 Anson

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE : Lucene scalability/clustering

2004-02-24 Thread Anson Lau
RBP,

I'm implementing a search engine for a project at work.  It's going to
index approx 1.5 rows in a database.

I am trying to get a feel of what my options are when scalability
becomes an issue.  I also want to know if those options require me to
implement my app in a different way right from the start.

Anson

-Original Message-
From: Rasik Pandey [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 24, 2004 9:34 PM
To: 'Lucene Users List'
Subject: RE : RE : Lucene scalability/clustering

 I'm trying to see what are some common ways to scale lucene
 onto multiple boxes.  Is RMI based search and using a 
 MultiSearcher the general approach?

More details about what you are attempting would be helpful.


RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: RE : Lucene scalability/clustering

2004-02-23 Thread Anson Lau
I'm trying to see what are some common ways to scale lucene onto
multiple boxes.  Is RMI based search and using a MultiSearcher the
general approach?

There doesn't seem to be many articles on the web on how to implement a
lucene search cluster.  If anyone knows a good article can you please
post it here?

Thanks,

Anson

-Original Message-
From: Rasik Pandey [mailto:[EMAIL PROTECTED]
Sent: Monday, February 23, 2004 9:46 PM
To: 'Lucene Users List'
Subject: RE : Lucene scalability/clustering

 Further on this topic - has anyone tried implementing a
 distributed
 search with Lucene?  How does it work and does it work well?

I assume you are referring to RMI based search? It works well as does
MultiSearcher. 

RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene scalability/clustering

2004-02-22 Thread Anson Lau

Further on this topic - has anyone tried implementing a distributed
search with Lucene?  How does it work and does it work well?


Anson


-Original Message-
From: Hamish Carpenter [mailto:[EMAIL PROTECTED]
Sent: Monday, February 23, 2004 5:24 AM
To: Lucene Users List
Subject: Re: Lucene scalability/clustering

Hi All,

I'm Hamish Carpenter who contributed the benchmarks with the comment
about the IndexSearcherCache.  Using this solved our issues with too
many files open under linux.

The original IndexSearcherCache email is here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01967.html

See here for a copy of the above message and a download link:
http://www.geocities.com/haytona/lucene/
The mailing list doesn't like attachments.  The source is 10K in size.

HTH

Hamish Carpenter.

[EMAIL PROTECTED] wrote:
  BTW, where can I get Peter Halacsy's IndexSearcherCache?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



number of fields, size of fields

2004-02-17 Thread Anson Lau
Hi All,

I'm a beginner with Lucene.  I would like to know if there are general
guidelines on:

1. the number of field a document can have
2. size of unindexed fields
3. size of a stored text field

I just want to get a feel for what are the good practices.

Thanks,

Anson Lau


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]