Re: Best Practices for indexing in Web application
Doug Cutting wrote: Michael Steiger wrote: I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. If you found this confusing, and would have been helped by some examples, please take the time to donate some good examples. Lucene is free, but requires donations to improve. (Even if you don't, just by asking these questions, prodding others to answer, you've helped a lot of folks.) Doug, first of all thanks for all of your work! If I write some code which is donatable (does this word really exist? ;-)) I will do so. But I heard today that the project I was trying to extend with Lucene has been stopped. No more work for the next time on fulltext and Lucene. But maybe I'll find some sparetime and finish the work I was starting now. Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Michael Steiger wrote: I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. If you found this confusing, and would have been helped by some examples, please take the time to donate some good examples. Lucene is free, but requires donations to improve. (Even if you don't, just by asking these questions, prodding others to answer, you've helped a lot of folks.) Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Michael Steiger writes: > > > > Depends on your application, but if you can, it's better to keep IndexSearcher > > open until the index changes. > > Otherwise you will have to open all the index files for each search. > > Good tip. So I have to synchronize (logically) my search routine with > any updates and if the index changes I have to close the Searcher and > reopen it. > Right. The hard part is, that you shouldn't close the searcher when there still is access the that searcher. E.g. if you have a szenario - do search - index changes - access search results you cannot close the searcher until you accessed all search results. But that can be done by a little bit of reference counting. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Morus Walter wrote: Michael Steiger writes: I am using an IndexSearcher for querying the index but for deletions I need to use the IndexReader. I now know that I can have Readers and a Writer open concurrently but IndexReader.delete can only be used if no Writer is open. You should be aware that an IndexSearcher uses a readonly IndexReader. So you can't ignore it for your considerations. I thought so. I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. I want to open the IndexSearcher only while searching and close it afterwards. Depends on your application, but if you can, it's better to keep IndexSearcher open until the index changes. Otherwise you will have to open all the index files for each search. Good tip. So I have to synchronize (logically) my search routine with any updates and if the index changes I have to close the Searcher and reopen it. Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Morus Walter wrote: Michael Steiger writes: My problem is that I do not know in advance if or when the "index won't change for some time". I think of running a counter for all updates (add, delete) and optimize after some threshold. That's what lucene does anyway. Adding documents means adding segments, and when the number of segments reaches a certain limit, lucene merges these segments. Optimizing also just merges the segments. So if you introduce your own calculation when a merge should be done, your just doubling the effort. OTOH it might be worth thinking about optimizing the index at times, when there won't be much searches on the index (e.g. nightly). Depending on the number of daily updates this might prevent merges during days, when there is more search access (while searches can be done parallel to optimizing, the additional task on the same machine will slow down them a bit). I was indeed thinking about doing optimization during nights but only after my application-specific threshold would be reached. There's are two articles about lucenes indexing details written by Otis Gospodnetic that are linked on the lucene web page, that might be worth reading. Especially the second (Advanced Text Indexing with Lucene). Thanks, I have already seen these articles. Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Michael Steiger writes: > > I am using an IndexSearcher for querying the index but for deletions I > need to use the IndexReader. I now know that I can have Readers and a > Writer open concurrently but IndexReader.delete can only be used if no > Writer is open. > You should be aware that an IndexSearcher uses a readonly IndexReader. So you can't ignore it for your considerations. > > I want to open the IndexSearcher only while searching and close it > afterwards. > Depends on your application, but if you can, it's better to keep IndexSearcher open until the index changes. Otherwise you will have to open all the index files for each search. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Michael Steiger writes: > > My problem is that I do not know in advance if or when the "index won't > change for some time". I think of running a counter for all updates > (add, delete) and optimize after some threshold. > That's what lucene does anyway. Adding documents means adding segments, and when the number of segments reaches a certain limit, lucene merges these segments. Optimizing also just merges the segments. So if you introduce your own calculation when a merge should be done, your just doubling the effort. OTOH it might be worth thinking about optimizing the index at times, when there won't be much searches on the index (e.g. nightly). Depending on the number of daily updates this might prevent merges during days, when there is more search access (while searches can be done parallel to optimizing, the additional task on the same machine will slow down them a bit). There's are two articles about lucenes indexing details written by Otis Gospodnetic that are linked on the lucene web page, that might be worth reading. Especially the second (Advanced Text Indexing with Lucene). Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Kelvin Tan wrote: On Mon, 01 Mar 2004 12:24:26 +0100, Michael Steiger said: 1. Concurrency of IndexWriter and IndexReader It seems that it is not allowed to open an IndexWriter and an IndexReader at the same time. But if one user is changing records in the database (and therefore changing documents in the Lucene index) and another user is querying the index, I would need to open them both. Not necessarily so. Shouldn't you be using an IndexSearcher to query the index instead of an IndexReader? IndexReader should be used primarily for deleting and low-level document retrieval via Terms. I am using an IndexSearcher for querying the index but for deletions I need to use the IndexReader. I now know that I can have Readers and a Writer open concurrently but IndexReader.delete can only be used if no Writer is open. 2. Optimizing the index This is maybe related to my first issue. I assume that while optimizing the index no queries are allowed. How often should the index be optimized? Again, IndexSearcher has no problem with the index being modified (via optimizing, or otherwise) whilst searching. However, you'll need to have some way of refreshing your IndexSearcher when this happens, otherwise the IndexSearcher would be obsolete. I want to open the IndexSearcher only while searching and close it afterwards. Thanks Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Otis Gospodnetic wrote: --- Michael Steiger <[EMAIL PROTECTED]> wrote: Hello Lucene Users, I have a web application using Oracle as the database and I want to add fulltext query capablities. My idea was to extend the existing insert, update and delete methods of my backend classes to add, delete/add and delete respectively. I'm new to Lucene and after a bit of googling and reading I found a few issues which I do not know how to resolve in the moment. 1. Concurrency of IndexWriter and IndexReader It seems that it is not allowed to open an IndexWriter and an IndexReader at the same time. But if one user is changing records in the database (and therefore changing documents in the Lucene index) and another user is querying the index, I would need to open them both. This problem should be easily solvable through synchronizing on an object that is used as a shared lock. Misunderstanding solved due another response. Should be no problem anymore. 2. Optimizing the index This is maybe related to my first issue. I assume that while optimizing the index no queries are allowed. How often should the index be optimized? Queries are allowed while an index is being optimized. You can optimize as often as you'd like, but the recommended approach is to optimize only when you know your index won't change for a while, or if you are running out of file descriptors. This is covered in at least one of the Lucene articles (links on the site). Otis, thanks for your help Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Morus Walter wrote: Michael Steiger writes: Hello Lucene Users, I have a web application using Oracle as the database and I want to add fulltext query capablities. My idea was to extend the existing insert, update and delete methods of my backend classes to add, delete/add and delete respectively. I'm new to Lucene and after a bit of googling and reading I found a few issues which I do not know how to resolve in the moment. 1. Concurrency of IndexWriter and IndexReader It seems that it is not allowed to open an IndexWriter and an IndexReader at the same time. But if one user is changing records in the database (and therefore changing documents in the Lucene index) and another user is querying the index, I would need to open them both. No. You can have one IndexWriter and an arbitray number of IndexReaders, that don't write to the index. Despite it's name IndexReader is used for deletions. Since this is a write operation, it cannot be done, while a IndexWrite is open. But that's the only limitation. OK, misunderstanding found. I already know that IndexReader is mostly used for deletions but I thought that I can not have Reader and Writer objects open at the same time. I now understand that I can have multiple Readers but the delete operations must be synchronized with open Writers. IndexReaders will not see changes introduced by writers until they are closed and reopened. And one should be aware that keeping the Readers open after changes means that the associated files are kept open, which may result in too many open files. OK. I only wanted to open a Reader for deletions and close it afterwards. So this should be no problem. 2. Optimizing the index This is maybe related to my first issue. I assume that while optimizing the index no queries are allowed. How often should the index be optimized? AFAIK there's no problem with searching during optimizing of an index. IIRC the FAQ says, that optimizing should only be done, if you know, that your index won't change for some time. As far as I understand lucenes indexing, the reason is, that lucene will automatically merge the index files (based on MergeFactor), which is, what optimizing does. My problem is that I do not know in advance if or when the "index won't change for some time". I think of running a counter for all updates (add, delete) and optimize after some threshold. Thanks for your help Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
On Mon, 01 Mar 2004 12:24:26 +0100, Michael Steiger said: > 1. Concurrency of IndexWriter and IndexReader > It seems that it is not allowed to open an IndexWriter and an > IndexReader at the same time. But if one user is changing records in the > database (and therefore changing documents in the Lucene index) and > another user is querying the index, I would need to open them both. Not necessarily so. Shouldn't you be using an IndexSearcher to query the index instead of an IndexReader? IndexReader should be used primarily for deleting and low-level document retrieval via Terms. > > 2. Optimizing the index > This is maybe related to my first issue. > I assume that while optimizing the index no queries are allowed. How > often should the index be optimized? > Again, IndexSearcher has no problem with the index being modified (via optimizing, or otherwise) whilst searching. However, you'll need to have some way of refreshing your IndexSearcher when this happens, otherwise the IndexSearcher would be obsolete. Kelvin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
Michael Steiger writes: > Hello Lucene Users, > I have a web application using Oracle as the database and I want to add > fulltext query capablities. My idea was to extend the existing insert, > update and delete methods of my backend classes to add, delete/add and > delete respectively. > > I'm new to Lucene and after a bit of googling and reading I found a few > issues which I do not know how to resolve in the moment. > > 1. Concurrency of IndexWriter and IndexReader > It seems that it is not allowed to open an IndexWriter and an > IndexReader at the same time. But if one user is changing records in the > database (and therefore changing documents in the Lucene index) and > another user is querying the index, I would need to open them both. > No. You can have one IndexWriter and an arbitray number of IndexReaders, that don't write to the index. Despite it's name IndexReader is used for deletions. Since this is a write operation, it cannot be done, while a IndexWrite is open. But that's the only limitation. IndexReaders will not see changes introduced by writers until they are closed and reopened. And one should be aware that keeping the Readers open after changes means that the associated files are kept open, which may result in too many open files. > 2. Optimizing the index > This is maybe related to my first issue. > I assume that while optimizing the index no queries are allowed. How > often should the index be optimized? > AFAIK there's no problem with searching during optimizing of an index. IIRC the FAQ says, that optimizing should only be done, if you know, that your index won't change for some time. As far as I understand lucenes indexing, the reason is, that lucene will automatically merge the index files (based on MergeFactor), which is, what optimizing does. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for indexing in Web application
--- Michael Steiger <[EMAIL PROTECTED]> wrote: > Hello Lucene Users, > I have a web application using Oracle as the database and I want to > add > fulltext query capablities. My idea was to extend the existing > insert, > update and delete methods of my backend classes to add, delete/add > and > delete respectively. > > I'm new to Lucene and after a bit of googling and reading I found a > few > issues which I do not know how to resolve in the moment. > > 1. Concurrency of IndexWriter and IndexReader > It seems that it is not allowed to open an IndexWriter and an > IndexReader at the same time. But if one user is changing records in > the > database (and therefore changing documents in the Lucene index) and > another user is querying the index, I would need to open them both. This problem should be easily solvable through synchronizing on an object that is used as a shared lock. > 2. Optimizing the index > This is maybe related to my first issue. > I assume that while optimizing the index no queries are allowed. How > often should the index be optimized? Queries are allowed while an index is being optimized. You can optimize as often as you'd like, but the recommended approach is to optimize only when you know your index won't change for a while, or if you are running out of file descriptors. This is covered in at least one of the Lucene articles (links on the site). Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Best Practices for indexing in Web application
Hello Lucene Users, I have a web application using Oracle as the database and I want to add fulltext query capablities. My idea was to extend the existing insert, update and delete methods of my backend classes to add, delete/add and delete respectively. I'm new to Lucene and after a bit of googling and reading I found a few issues which I do not know how to resolve in the moment. 1. Concurrency of IndexWriter and IndexReader It seems that it is not allowed to open an IndexWriter and an IndexReader at the same time. But if one user is changing records in the database (and therefore changing documents in the Lucene index) and another user is querying the index, I would need to open them both. 2. Optimizing the index This is maybe related to my first issue. I assume that while optimizing the index no queries are allowed. How often should the index be optimized? Thanks in advance Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]