Re: Best Practices for indexing in Web application

2004-03-03 Thread Michael Steiger
Doug Cutting wrote:

Michael Steiger wrote:

I'm wondering that there are no samples for this job. I do not think 
that I am the first one looking for this.


If you found this confusing, and would have been helped by some 
examples, please take the time to donate some good examples.  Lucene  is 
free, but requires donations to improve.

(Even if you don't, just by asking these questions, prodding others to 
answer, you've helped a lot of folks.)
Doug,
first of all thanks for all of your work!
If I write some code which is donatable (does this word really exist? 
;-)) I will do so. But I heard today that the project I was trying to 
extend with Lucene has been stopped. No more work for the next time on 
fulltext and Lucene. But maybe I'll find some sparetime and finish the 
work I was starting now.

Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-03 Thread Doug Cutting
Michael Steiger wrote:
I'm wondering that there are no samples for this job. I do not think 
that I am the first one looking for this.
If you found this confusing, and would have been helped by some 
examples, please take the time to donate some good examples.  Lucene  is 
free, but requires donations to improve.

(Even if you don't, just by asking these questions, prodding others to 
answer, you've helped a lot of folks.)

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-03 Thread Morus Walter
Michael Steiger writes:
> > 
> > Depends on your application, but if you can, it's better to keep IndexSearcher
> > open until the index changes.
> > Otherwise you will have to open all the index files for each search.
> 
> Good tip. So I have to synchronize (logically) my search routine with 
> any updates and if the index changes I have to close the Searcher and 
> reopen it.
> 
Right. The hard part is, that you shouldn't close the searcher when there
still is access the that searcher.
E.g. if you have a szenario
- do search
- index changes
- access search results
you cannot close the searcher until you accessed all search results.
But that can be done by a little bit of reference counting.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Practices for indexing in Web application

2004-03-02 Thread Michael Steiger
Morus Walter wrote:

Michael Steiger writes:

I am using an IndexSearcher for querying the index but for deletions I 
need to use the IndexReader. I now know that I can have Readers and a 
Writer open concurrently but IndexReader.delete can only be used if no 
Writer is open.

You should be aware that an IndexSearcher uses a readonly IndexReader.
So you can't ignore it for your considerations.
I thought so.
I'm wondering that there are no samples for this job. I do not think 
that I am the first one looking for this.

I want to open the IndexSearcher only while searching and close it 
afterwards.

Depends on your application, but if you can, it's better to keep IndexSearcher
open until the index changes.
Otherwise you will have to open all the index files for each search.
Good tip. So I have to synchronize (logically) my search routine with 
any updates and if the index changes I have to close the Searcher and 
reopen it.

Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-02 Thread Michael Steiger
Morus Walter wrote:

Michael Steiger writes:

My problem is that I do not know in advance if or when the "index won't 
change for some time". I think of running a counter for all updates 
(add, delete) and optimize after some threshold.

That's what lucene does anyway. 
Adding documents means adding segments, and when the number of segments
reaches a certain limit, lucene merges these segments.
Optimizing also just merges the segments. So if you introduce your own 
calculation when a merge should be done, your just doubling the effort.
OTOH it might be worth thinking about optimizing the index at times, when
there won't be much searches on the index (e.g. nightly). Depending on
the number of daily updates this might prevent merges during days, when
there is more search access (while searches can be done parallel to 
optimizing, the additional task on the same machine will slow down them
a bit).
I was indeed thinking about doing optimization during nights but only 
after my application-specific threshold would be reached.

There's are two articles about lucenes indexing details written by
Otis Gospodnetic that are linked on the lucene web page, that might
be worth reading. Especially the second (Advanced Text Indexing with Lucene).
Thanks, I have already seen these articles.

Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-02 Thread Morus Walter
Michael Steiger writes:
> 
> I am using an IndexSearcher for querying the index but for deletions I 
> need to use the IndexReader. I now know that I can have Readers and a 
> Writer open concurrently but IndexReader.delete can only be used if no 
> Writer is open.
> 
You should be aware that an IndexSearcher uses a readonly IndexReader.
So you can't ignore it for your considerations.
> 
> I want to open the IndexSearcher only while searching and close it 
> afterwards.
> 
Depends on your application, but if you can, it's better to keep IndexSearcher
open until the index changes.
Otherwise you will have to open all the index files for each search.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Practices for indexing in Web application

2004-03-02 Thread Morus Walter
Michael Steiger writes:
> 
> My problem is that I do not know in advance if or when the "index won't 
> change for some time". I think of running a counter for all updates 
> (add, delete) and optimize after some threshold.
> 
That's what lucene does anyway. 
Adding documents means adding segments, and when the number of segments
reaches a certain limit, lucene merges these segments.
Optimizing also just merges the segments. So if you introduce your own 
calculation when a merge should be done, your just doubling the effort.
OTOH it might be worth thinking about optimizing the index at times, when
there won't be much searches on the index (e.g. nightly). Depending on
the number of daily updates this might prevent merges during days, when
there is more search access (while searches can be done parallel to 
optimizing, the additional task on the same machine will slow down them
a bit).
There's are two articles about lucenes indexing details written by
Otis Gospodnetic that are linked on the lucene web page, that might
be worth reading. Especially the second (Advanced Text Indexing with Lucene).

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Practices for indexing in Web application

2004-03-02 Thread Michael Steiger
Kelvin Tan wrote:

On Mon, 01 Mar 2004 12:24:26 +0100, Michael Steiger said:

1. Concurrency of IndexWriter and IndexReader
It seems that it is not allowed to open an IndexWriter and an
IndexReader at the same time. But if one user is changing records in the
database (and therefore changing documents in the Lucene index) and
another user is querying the index, I would need to open them both.


Not necessarily so. Shouldn't you be using an IndexSearcher to query the index 
instead of an IndexReader? IndexReader should be used primarily for deleting and 
low-level document retrieval via Terms.
I am using an IndexSearcher for querying the index but for deletions I 
need to use the IndexReader. I now know that I can have Readers and a 
Writer open concurrently but IndexReader.delete can only be used if no 
Writer is open.

2. Optimizing the index
This is maybe related to my first issue.
I assume that while optimizing the index no queries are allowed. How
often should the index be optimized?


Again, IndexSearcher has no problem with the index being modified (via 
optimizing, or otherwise) whilst searching. However, you'll need to have some 
way of refreshing your IndexSearcher when this happens, otherwise the 
IndexSearcher  would be obsolete.
I want to open the IndexSearcher only while searching and close it 
afterwards.

Thanks
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-02 Thread Michael Steiger
Otis Gospodnetic wrote:

--- Michael Steiger <[EMAIL PROTECTED]> wrote:

Hello Lucene Users,
I have a web application using Oracle as the database and I want to
add 
fulltext query capablities. My idea was to extend the existing
insert, 
update and delete methods of my backend classes to add, delete/add
and 
delete respectively.

I'm new to Lucene and after a bit of googling and reading I found a
few 
issues which I do not know how to resolve in the moment.

1. Concurrency of IndexWriter and IndexReader
It seems that it is not allowed to open an IndexWriter and an 
IndexReader at the same time. But if one user is changing records in
the 
database (and therefore changing documents in the Lucene index) and 
another user is querying the index, I would need to open them both.


This problem should be easily solvable through synchronizing on an
object that is used as a shared lock.
Misunderstanding solved due another response. Should be no problem anymore.


2. Optimizing the index
This is maybe related to my first issue.
I assume that while optimizing the index no queries are allowed. How 
often should the index be optimized?


Queries are allowed while an index is being optimized.
You can optimize as often as you'd like, but the recommended approach
is to optimize only when you know your index won't change for a while,
or if you are running out of file descriptors.
This is covered in at least one of the Lucene articles (links on the
site).
Otis,
thanks for your help
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-02 Thread Michael Steiger
Morus Walter wrote:

Michael Steiger writes:

Hello Lucene Users,
I have a web application using Oracle as the database and I want to add 
fulltext query capablities. My idea was to extend the existing insert, 
update and delete methods of my backend classes to add, delete/add and 
delete respectively.

I'm new to Lucene and after a bit of googling and reading I found a few 
issues which I do not know how to resolve in the moment.

1. Concurrency of IndexWriter and IndexReader
It seems that it is not allowed to open an IndexWriter and an 
IndexReader at the same time. But if one user is changing records in the 
database (and therefore changing documents in the Lucene index) and 
another user is querying the index, I would need to open them both.

No.
You can have one IndexWriter and an arbitray number of IndexReaders, that
don't write to the index.
Despite it's name IndexReader is used for deletions. Since this is a write
operation, it cannot be done, while a IndexWrite is open.
But that's the only limitation.
OK, misunderstanding found. I already know that IndexReader is mostly 
used for deletions but I thought that I can not have Reader and Writer 
objects open at the same time. I now understand that I can have multiple 
Readers but the delete operations must be synchronized with open Writers.


IndexReaders will not see changes introduced by writers until they are
closed and reopened. And one should be aware that keeping the Readers open
after changes means that the associated files are kept open, which may result
in too many open files.
OK. I only wanted to open a Reader for deletions and close it 
afterwards. So this should be no problem.


2. Optimizing the index
This is maybe related to my first issue.
I assume that while optimizing the index no queries are allowed. How 
often should the index be optimized?

AFAIK there's no problem with searching during optimizing of an index.

IIRC the FAQ says, that optimizing should only be done, if you know, that
your index won't change for some time.
As far as I understand lucenes indexing, the reason is, that lucene will
automatically merge the index files (based on MergeFactor), which is, 
what optimizing does.
My problem is that I do not know in advance if or when the "index won't 
change for some time". I think of running a counter for all updates 
(add, delete) and optimize after some threshold.

Thanks for your help
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Best Practices for indexing in Web application

2004-03-01 Thread Kelvin Tan


On Mon, 01 Mar 2004 12:24:26 +0100, Michael Steiger said:
> 1. Concurrency of IndexWriter and IndexReader
> It seems that it is not allowed to open an IndexWriter and an
> IndexReader at the same time. But if one user is changing records in the
> database (and therefore changing documents in the Lucene index) and
> another user is querying the index, I would need to open them both.

Not necessarily so. Shouldn't you be using an IndexSearcher to query the index
instead of an IndexReader? IndexReader should be used primarily for deleting and
low-level document retrieval via Terms.

>
> 2. Optimizing the index
> This is maybe related to my first issue.
> I assume that while optimizing the index no queries are allowed. How
> often should the index be optimized?
>

Again, IndexSearcher has no problem with the index being modified (via
optimizing, or otherwise) whilst searching. However, you'll need to have some
way of refreshing your IndexSearcher when this happens, otherwise the
IndexSearcher  would be obsolete.

Kelvin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Practices for indexing in Web application

2004-03-01 Thread Morus Walter
Michael Steiger writes:
> Hello Lucene Users,
> I have a web application using Oracle as the database and I want to add 
> fulltext query capablities. My idea was to extend the existing insert, 
> update and delete methods of my backend classes to add, delete/add and 
> delete respectively.
> 
> I'm new to Lucene and after a bit of googling and reading I found a few 
> issues which I do not know how to resolve in the moment.
> 
> 1. Concurrency of IndexWriter and IndexReader
> It seems that it is not allowed to open an IndexWriter and an 
> IndexReader at the same time. But if one user is changing records in the 
> database (and therefore changing documents in the Lucene index) and 
> another user is querying the index, I would need to open them both.
> 
No.
You can have one IndexWriter and an arbitray number of IndexReaders, that
don't write to the index.
Despite it's name IndexReader is used for deletions. Since this is a write
operation, it cannot be done, while a IndexWrite is open.
But that's the only limitation.

IndexReaders will not see changes introduced by writers until they are
closed and reopened. And one should be aware that keeping the Readers open
after changes means that the associated files are kept open, which may result
in too many open files.

> 2. Optimizing the index
> This is maybe related to my first issue.
> I assume that while optimizing the index no queries are allowed. How 
> often should the index be optimized?
> 
AFAIK there's no problem with searching during optimizing of an index.

IIRC the FAQ says, that optimizing should only be done, if you know, that
your index won't change for some time.
As far as I understand lucenes indexing, the reason is, that lucene will
automatically merge the index files (based on MergeFactor), which is, 
what optimizing does.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Practices for indexing in Web application

2004-03-01 Thread Otis Gospodnetic

--- Michael Steiger <[EMAIL PROTECTED]> wrote:
> Hello Lucene Users,
> I have a web application using Oracle as the database and I want to
> add 
> fulltext query capablities. My idea was to extend the existing
> insert, 
> update and delete methods of my backend classes to add, delete/add
> and 
> delete respectively.
> 
> I'm new to Lucene and after a bit of googling and reading I found a
> few 
> issues which I do not know how to resolve in the moment.
> 
> 1. Concurrency of IndexWriter and IndexReader
> It seems that it is not allowed to open an IndexWriter and an 
> IndexReader at the same time. But if one user is changing records in
> the 
> database (and therefore changing documents in the Lucene index) and 
> another user is querying the index, I would need to open them both.

This problem should be easily solvable through synchronizing on an
object that is used as a shared lock.

> 2. Optimizing the index
> This is maybe related to my first issue.
> I assume that while optimizing the index no queries are allowed. How 
> often should the index be optimized?

Queries are allowed while an index is being optimized.
You can optimize as often as you'd like, but the recommended approach
is to optimize only when you know your index won't change for a while,
or if you are running out of file descriptors.

This is covered in at least one of the Lucene articles (links on the
site).

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Best Practices for indexing in Web application

2004-03-01 Thread Michael Steiger
Hello Lucene Users,
I have a web application using Oracle as the database and I want to add 
fulltext query capablities. My idea was to extend the existing insert, 
update and delete methods of my backend classes to add, delete/add and 
delete respectively.

I'm new to Lucene and after a bit of googling and reading I found a few 
issues which I do not know how to resolve in the moment.

1. Concurrency of IndexWriter and IndexReader
It seems that it is not allowed to open an IndexWriter and an 
IndexReader at the same time. But if one user is changing records in the 
database (and therefore changing documents in the Lucene index) and 
another user is querying the index, I would need to open them both.

2. Optimizing the index
This is maybe related to my first issue.
I assume that while optimizing the index no queries are allowed. How 
often should the index be optimized?

Thanks in advance
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]