As you mentioned, the ideal thing would be to have the facility of deletion of indices from a live index.
However, for practical setup purposes, I don't see why the backup instance would not work. When a 'Document' object is deleted from the Lucene index, it is marked as 'removed' in the internal settings of lucene. There is no physical deletion involved, till you optimize the index again. Hence, if you create a new NutchBean in the backup instance (and you can also run a test query such that it will load the lucene index into memory), and then make it live, any queries after that will be served quickly. The only issue would be the time lag for the new updated index to be reflected for queries. This is one reason why the Nutch/Lucene solution would be somewhat lacking for indexing quickly changing data. Praveen. On 7/20/05, smith learner <[EMAIL PROTECTED]> wrote: > It doesn't work. The reason is as I said that you need > to initialize nutchbean after index removed. You can't > initialize the backup bean before index removed. So, > here you only save the time of creating a nutchbean. > > Regards, > > Smith. > > --- Juho Mäkinen <[EMAIL PROTECTED]> wrote: > > > Just an idea which came into my mind. > > > > The reset jsp page could create a new NutchBean, > > perform a test query to initialize it's internal > > buffers, connections > > etc (I don't know what's in there) and after that, > > replace this newly created NutchBean instance with > > the one which is in the application scope. > > > > One problem with this would be the double need > > of required memory during this reset. Also, I don't > > know if this would work with the clustering systems. > > > > - Juho Mäkinen, http://www.juhonkoti.net > > > > > > > > On 7/20/05, smith learner <[EMAIL PROTECTED]> > > wrote: > > > > > > I am not sure hotswap or backup is a way to > > solving > > > this problem. The point here is after creating a > > new > > > NutchBean, the first search will take a long time. > > > Neither hotswap or backup can avoid this. Because, > > you > > > can't initialize a nutchBean before you remove > > some > > > entries from index database. but after removing > > > action, it is meaningless to initialize > > nutchBean. > > > > > > I don't know the cache is in nutch or lucene. It > > > seems that nutch only cache query filter. I > > suspect > > > the vital cache is in lucene. If it is true, I > > think > > > maybe it is better to improve current delete > > function. > > > First delete it from cache and then delete it from > > > index database. > > > > > > Regards, > > > > > > smith. > > > > > > --- praveen pathiyil <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, > > > > > > > > I am not sure of the exact layout of the code, > > but > > > > the NutchBean > > > > instance stores information related to the > > segments > > > > on file system in > > > > memory, which is initialized at startup. This is > > the > > > > reason why the > > > > changes at runtime are not reflected before a > > > > restart of the server. > > > > > > > > In a production setup, there might be some > > > > workarounds for you. The > > > > NutchBean instance is stored in > > 'application-scope'. > > > > So if you can add > > > > a jsp page or a servlet as part of the > > interface, > > > > which will set the > > > > NutchBean instance reference to null, it will > > cause > > > > the > > > > re-initialization of the NutchBean instance. > > > > > > > > If you don't want to touch the nutch code, > > another > > > > option would be to > > > > use two instances of tomcat (hot and backup or > > > > whatever the > > > > combination is called). Whenever you have a > > change > > > > to the index, > > > > restart the backup and then make that as the hot > > > > one. You will need > > > > some kind of cgi or other kind of script which > > > > processes the requests > > > > (to direct the requests). > > > > > > > > Hope this will server as some pointers, > > > > Praveen. > > > > > > > > On 7/19/05, smith learner > > <[EMAIL PROTECTED]> > > > > wrote: > > > > > Thank you for your reply. > > > > > > > > > > I think they missed an important feature. > > because > > > > > there always has need to filter out something > > (for > > > > > example adult web page). And you can't expect > > > > > restarting server every time you filter out > > these > > > > > things. > > > > > > > > > > Regards, > > > > > > > > > > Jack. > > > > > > > > > > --- praveen pathiyil <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > As far as I know, any change in the index is > > > > > > reflected only after you > > > > > > restart tomcat. > > > > > > > > > > > > > > > > > > > > > > > > On 7/18/05, smith learner > > > > <[EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > i ran nutch on tomcat. I searched for a > > > > document > > > > > > and > > > > > > > later I deleted the document from index (I > > > > mean > > > > > > > deleting the index from the index > > database). > > > > but I > > > > > > > still can get the document by nutch. I > > > > suspect > > > > > > that > > > > > > > it is because of cache. If that is true, > > how > > > > can > > > > > > > renew the cache without stop tomcat. > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > smith. > > > > > > > > > > > > > > > > > > > > __________________________________________________ > > > > > > > Do You Yahoo!? > > > > > > > Tired of spam? Yahoo! Mail has the best > > spam > > > > > > protection around > > > > > > > http://mail.yahoo.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > > > > SF.Net email is sponsored by: Discover > > Easy > > > > Linux > > > > > > Migration Strategies > > > > > > > from IBM. Find simple to follow Roadmaps, > > > > > > straightforward articles, > > > > > > > informative Webcasts and more! Get > > everything > > > > you > > > > > > need to get up to > > > > > > > speed, fast. > > > > > > > > > > > > > > > > > > > > > http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > > > > > > > > > > > _______________________________________________ > > > > > > > Nutch-general mailing list > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/nutch-general > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > > > SF.Net email is sponsored by: Discover Easy > > > > Linux > > > > > > Migration Strategies > > > > > > from IBM. Find simple to follow Roadmaps, > > > > > > straightforward articles, > > > > > > informative Webcasts and more! Get > > everything > > > === message truncated === > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > Nutch-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-general >
