Move from RAMDirectory to FSDirectory causing problem sometimes
Hi, I have been using a RAMDirectory for indexing without any problem, but I then moved to a file based directory to reduce memory usage. this has been working fine on Windows and OSX and my version of linux (redhat) but is failing on a version of linux (archlinux) with 'Too many files opened' , but they are only indexing 32 documents , I can index thousands without a problem. It mentions this error in the Lucene FAQ but I am not dealing directly with the filesystem myself, this is my code for creating an index is it okay or is there some kind of close that I am missing thanks for any help Paul public synchronized void reindex() { MainWindow.logger.info("Reindex start:" + new Date()); TableModel tableModel = table.getModel(); try { //Recreate the RAMDirectory uses too much memory //directory = new RAMDirectory(); directory = FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + TAG_BROWSER_INDEX); IndexWriter writer = new IndexWriter(directory, analyzer, true); //Iterate through all rows for (int row = 0; row < tableModel.getRowCount(); row++) { //for each row make a new document Document document = createDocument(row); writer.addDocument(document); } writer.optimize(); writer.close(); } catch (Exception e) { throw new RuntimeException("Problem indexing Data:" + e.getMessage()); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
'deletable' indexing files are not deleted on RHEL5
Hi I'm using Lucene on a RHEL5 box. The indexing folder is growing extremely large, more than 20 GB, with a lot 'deletable' indexing files. It runs out of the disk. I have to clear the entire folder and start indexing from blank. The code ran fine before I moved it onto RHEL5. Does that matter? Can anyone give some suggestions on how to solve this issue? Thanks in advance. Best Regards, Frank Dai (Dai Zhoulin 戴周林) Lotus Connections - Dogear Development, WPLC China Development Lab, IBM Shanghai TEL:(86-21)60928189 Internet ID: [EMAIL PROTECTED] Addr: 4F, No 78, Lane 887, Zu Chong Zhi Road, Zhang Jiang High Tech Park, 201203, Shanghai, China My Blog: http://www.daizhoulin.com/wordpress
Re: 'deletable' indexing files are not deleted on RHEL5
What do you mean by "deletable" indexing files? Moving to RHEL5 should have no effect (vs other platforms) on how much disk space is used. However, Lucene's disk usage can be surprising. While merging segments it will temporarily require free space equal to the size of the resulting merged segment. For a large merge this can be a sizable percentage of your total index size. Mike Zhou Lin Dai wrote: Hi I'm using Lucene on a RHEL5 box. The indexing folder is growing extremely large, more than 20 GB, with a lot 'deletable' indexing files. It runs out of the disk. I have to clear the entire folder and start indexing from blank. The code ran fine before I moved it onto RHEL5. Does that matter? Can anyone give some suggestions on how to solve this issue? Thanks in advance. Best Regards, Frank Dai (Dai Zhoulin 戴周林) Lotus Connections - Dogear Development, WPLC China Development Lab, IBM Shanghai TEL:(86-21)60928189 Internet ID: [EMAIL PROTECTED] Addr: 4F, No 78, Lane 887, Zu Chong Zhi Road, Zhang Jiang High Tech Park, 201203, Shanghai, China My Blog: http://www.daizhoulin.com/wordpress - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
Technically you should call directory.close() as well, but missing that will not lead to too many open files. How often is that RuntimeException being thrown? EG if a single document is frequently hitting an exception during analysis, your code doesn't close the IndexWriter in that situation. It's better to use a try/finally and close the IndexWriter in the finally clause, to cover that case. Are you sure nothing else is using up file descriptors? EG the createDocument call does not open any files? Mike Paul Taylor wrote: Hi, I have been using a RAMDirectory for indexing without any problem, but I then moved to a file based directory to reduce memory usage. this has been working fine on Windows and OSX and my version of linux (redhat) but is failing on a version of linux (archlinux) with 'Too many files opened' , but they are only indexing 32 documents , I can index thousands without a problem. It mentions this error in the Lucene FAQ but I am not dealing directly with the filesystem myself, this is my code for creating an index is it okay or is there some kind of close that I am missing thanks for any help Paul public synchronized void reindex() { MainWindow.logger.info("Reindex start:" + new Date()); TableModel tableModel = table.getModel(); try { //Recreate the RAMDirectory uses too much memory //directory = new RAMDirectory(); directory = FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + TAG_BROWSER_INDEX); IndexWriter writer = new IndexWriter(directory, analyzer, true); //Iterate through all rows for (int row = 0; row < tableModel.getRowCount(); row++) { //for each row make a new document Document document = createDocument(row); writer.addDocument(document); } writer.optimize(); writer.close(); } catch (Exception e) { throw new RuntimeException("Problem indexing Data:" + e.getMessage()); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
Michael McCandless wrote: Technically you should call directory.close() as well, but missing that will not lead to too many open files. How often is that RuntimeException being thrown? EG if a single document is frequently hitting an exception during analysis, your code doesn't close the IndexWriter in that situation. It's better to use a try/finally and close the IndexWriter in the finally clause, to cover that case. Are you sure nothing else is using up file descriptors? EG the createDocument call does not open any files? Mike The runtimeException is occurring all the time, Im waiting for some more information from the user. Since the post I've since added directory.close() too, I thought this would cause a problem when I call IndexSearcher with it as a parameter but it seems to still work - the documentation is not very clear on this point. I see your poibnt about the try/finally I'll make that chnage. There are many other parts of the code that use filedescriptors, but the problem has never occurred before moving to a FSDirectory thanks paul heres an example of my search code, is this ok ? public boolean recNoColumnMatchesSearch(Integer columnId, Integer recNo, String search) { try { IndexSearcher is = new IndexSearcher(directory); //Build a query based on the fields, searchString and standard analyzer QueryParser parser = new QueryParser(String.valueOf(columnId) + INDEXED, analyzer); Query query = parser.parse(search); MainWindow.logger.finer("Parsed Search Query Is" + query.toString() + "of type:" + query.getClass()); //Create a filter,to restrict search to one row Filter filter = new QueryFilter(new TermQuery(new Term(ROW_NUMBER, String.valueOf(recNo; //run the search Hits hits = is.search(query, filter); Iterator i = hits.iterator(); if (i.hasNext()) { return true; } } catch (ParseException pe) { //Problem with syntax rather than throwing exception and causing everything to stop we just //log and return false MainWindow.logger.warning("Search Query invalid:" + pe.getMessage()); return false; } catch (IOException e) { MainWindow.logger.warning("DataIndexer.Unable to do perform reno match search:" + search + ":" + e); } return false; Paul Taylor wrote: Hi, I have been using a RAMDirectory for indexing without any problem, but I then moved to a file based directory to reduce memory usage. this has been working fine on Windows and OSX and my version of linux (redhat) but is failing on a version of linux (archlinux) with 'Too many files opened' , but they are only indexing 32 documents , I can index thousands without a problem. It mentions this error in the Lucene FAQ but I am not dealing directly with the filesystem myself, this is my code for creating an index is it okay or is there some kind of close that I am missing thanks for any help Paul public synchronized void reindex() { MainWindow.logger.info("Reindex start:" + new Date()); TableModel tableModel = table.getModel(); try { //Recreate the RAMDirectory uses too much memory //directory = new RAMDirectory(); directory = FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + TAG_BROWSER_INDEX); IndexWriter writer = new IndexWriter(directory, analyzer, true); //Iterate through all rows for (int row = 0; row < tableModel.getRowCount(); row++) { //for each row make a new document Document document = createDocument(row); writer.addDocument(document); } writer.optimize(); writer.close(); } catch (Exception e) { throw new RuntimeException("Problem indexing Data:" + e.getMessage()); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
Hmmm, you should not close the directory if you are then going to use it to instantiate a searcher. Your code below never closes the searcher? I think that is most likely the source of your file descriptor leaks. Mike Paul Taylor wrote: Michael McCandless wrote: Technically you should call directory.close() as well, but missing that will not lead to too many open files. How often is that RuntimeException being thrown? EG if a single document is frequently hitting an exception during analysis, your code doesn't close the IndexWriter in that situation. It's better to use a try/finally and close the IndexWriter in the finally clause, to cover that case. Are you sure nothing else is using up file descriptors? EG the createDocument call does not open any files? Mike The runtimeException is occurring all the time, Im waiting for some more information from the user. Since the post I've since added directory.close() too, I thought this would cause a problem when I call IndexSearcher with it as a parameter but it seems to still work - the documentation is not very clear on this point. I see your poibnt about the try/finally I'll make that chnage. There are many other parts of the code that use filedescriptors, but the problem has never occurred before moving to a FSDirectory thanks paul heres an example of my search code, is this ok ? public boolean recNoColumnMatchesSearch(Integer columnId, Integer recNo, String search) { try { IndexSearcher is = new IndexSearcher(directory); //Build a query based on the fields, searchString and standard analyzer QueryParser parser = new QueryParser(String.valueOf(columnId) + INDEXED, analyzer); Query query = parser.parse(search); MainWindow.logger.finer("Parsed Search Query Is" + query.toString() + "of type:" + query.getClass()); //Create a filter,to restrict search to one row Filter filter = new QueryFilter(new TermQuery(new Term(ROW_NUMBER, String.valueOf(recNo; //run the search Hits hits = is.search(query, filter); Iterator i = hits.iterator(); if (i.hasNext()) { return true; } } catch (ParseException pe) { //Problem with syntax rather than throwing exception and causing everything to stop we just //log and return false MainWindow.logger.warning("Search Query invalid:" + pe.getMessage()); return false; } catch (IOException e) { MainWindow.logger.warning("DataIndexer.Unable to do perform reno match search:" + search + ":" + e); } return false; Paul Taylor wrote: Hi, I have been using a RAMDirectory for indexing without any problem, but I then moved to a file based directory to reduce memory usage. this has been working fine on Windows and OSX and my version of linux (redhat) but is failing on a version of linux (archlinux) with 'Too many files opened' , but they are only indexing 32 documents , I can index thousands without a problem. It mentions this error in the Lucene FAQ but I am not dealing directly with the filesystem myself, this is my code for creating an index is it okay or is there some kind of close that I am missing thanks for any help Paul public synchronized void reindex() { MainWindow.logger.info("Reindex start:" + new Date()); TableModel tableModel = table.getModel(); try { //Recreate the RAMDirectory uses too much memory //directory = new RAMDirectory(); directory = FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + TAG_BROWSER_INDEX); IndexWriter writer = new IndexWriter(directory, analyzer, true); //Iterate through all rows for (int row = 0; row < tableModel.getRowCount(); row++) { //for each row make a new document Document document = createDocument(row); writer.addDocument(document); } writer.optimize(); writer.close(); } catch (Exception e) { throw new RuntimeException("Problem indexing Data:" + e.getMessage()); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
Also, if possible, you should share the IndexSearcher across multiple searches (ie, don't open/close a new one per search). Opening an IndexSearcher can be a resource intensive operation, so you'll see better throughput if you share. (Though in your particular situation it may not matter). Mike Paul Taylor wrote: Michael McCandless wrote: Technically you should call directory.close() as well, but missing that will not lead to too many open files. How often is that RuntimeException being thrown? EG if a single document is frequently hitting an exception during analysis, your code doesn't close the IndexWriter in that situation. It's better to use a try/finally and close the IndexWriter in the finally clause, to cover that case. Are you sure nothing else is using up file descriptors? EG the createDocument call does not open any files? Mike The runtimeException is occurring all the time, Im waiting for some more information from the user. Since the post I've since added directory.close() too, I thought this would cause a problem when I call IndexSearcher with it as a parameter but it seems to still work - the documentation is not very clear on this point. I see your poibnt about the try/finally I'll make that chnage. There are many other parts of the code that use filedescriptors, but the problem has never occurred before moving to a FSDirectory thanks paul heres an example of my search code, is this ok ? public boolean recNoColumnMatchesSearch(Integer columnId, Integer recNo, String search) { try { IndexSearcher is = new IndexSearcher(directory); //Build a query based on the fields, searchString and standard analyzer QueryParser parser = new QueryParser(String.valueOf(columnId) + INDEXED, analyzer); Query query = parser.parse(search); MainWindow.logger.finer("Parsed Search Query Is" + query.toString() + "of type:" + query.getClass()); //Create a filter,to restrict search to one row Filter filter = new QueryFilter(new TermQuery(new Term(ROW_NUMBER, String.valueOf(recNo; //run the search Hits hits = is.search(query, filter); Iterator i = hits.iterator(); if (i.hasNext()) { return true; } } catch (ParseException pe) { //Problem with syntax rather than throwing exception and causing everything to stop we just //log and return false MainWindow.logger.warning("Search Query invalid:" + pe.getMessage()); return false; } catch (IOException e) { MainWindow.logger.warning("DataIndexer.Unable to do perform reno match search:" + search + ":" + e); } return false; Paul Taylor wrote: Hi, I have been using a RAMDirectory for indexing without any problem, but I then moved to a file based directory to reduce memory usage. this has been working fine on Windows and OSX and my version of linux (redhat) but is failing on a version of linux (archlinux) with 'Too many files opened' , but they are only indexing 32 documents , I can index thousands without a problem. It mentions this error in the Lucene FAQ but I am not dealing directly with the filesystem myself, this is my code for creating an index is it okay or is there some kind of close that I am missing thanks for any help Paul public synchronized void reindex() { MainWindow.logger.info("Reindex start:" + new Date()); TableModel tableModel = table.getModel(); try { //Recreate the RAMDirectory uses too much memory //directory = new RAMDirectory(); directory = FSDirectory.getDirectory(Platform.getPlatformLicenseFolder()+ "/" + TAG_BROWSER_INDEX); IndexWriter writer = new IndexWriter(directory, analyzer, true); //Iterate through all rows for (int row = 0; row < tableModel.getRowCount(); row++) { //for each row make a new document Document document = createDocument(row); writer.addDocument(document); } writer.optimize(); writer.close(); } catch (Exception e) { throw new RuntimeException("Problem indexing Data:" + e.getMessage()); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTEC
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
Michael McCandless wrote: Hmmm, you should not close the directory if you are then going to use it to instantiate a searcher. how come it works ? Your code below never closes the searcher? I think that is most likely the source of your file descriptor leaks. Ok fixed paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 'deletable' indexing files are not deleted on RHEL5
Assuming your indexing completes, after the whole thing is done and the process terminates, what is the size of your index? Is it possible that your old box had lots more disk space and you just never noticed the (perhaps temporary) disk space usage? Best Erick 2008/7/8 Zhou Lin Dai <[EMAIL PROTECTED]>: > > Hi > > I'm using Lucene on a RHEL5 box. The indexing folder is growing extremely > large, more than 20 GB, with a lot 'deletable' indexing files. It runs out > of the disk. I have to clear the entire folder and start indexing from > blank. The code ran fine before I moved it onto RHEL5. Does that matter? > Can anyone give some suggestions on how to solve this issue? > > Thanks in advance. > > Best Regards, > > Frank Dai (Dai Zhoulin 戴周林) > Lotus Connections - Dogear Development, WPLC > China Development Lab, IBM Shanghai > TEL:(86-21)60928189 > Internet ID: [EMAIL PROTECTED] > Addr: 4F, No 78, Lane 887, Zu Chong Zhi Road, Zhang Jiang High Tech Park, > 201203, Shanghai, China > My Blog: http://www.daizhoulin.com/wordpress
Re: How to make documents clustering and topic classification with lucene
Use Carrot2: http://project.carrot2.org/ For Lucene + Carrot2: http://project.carrot2.org/faq.html#lucene-integration -glen 2008/7/7 Ariel <[EMAIL PROTECTED]>: > Hi everybody: > Do you have Idea how to make how to make documents clustering and topic > classification using lucene ??? Is there anyway to do this. > Please I need help. > Thanks everybody. > Ariel > -- - - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Readers synchronization
According to SVN history on the next version this will be available: LUCENE-1044: IndexWriter with autoCommit=true now commits (such that a reader can see the changes) far less often than it used to. Previously, every flush was also a commit. You can always force a commit by calling IndexWriter.commit(). Furthermore, in 3.0, autoCommit will be hardwired to false (IndexWriter constructors that take an autoCommit argument have been deprecated) (Mike McCandless) Does this mean that I won't need to reopen all the readers in order to see the index changes? Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
It works because Lucene doesn't currently check for it, and, because closing an FSDirectory does not actually make it unusable. In fact it also doesn't catch a double-close call. But it may cause subtle problems, because FSDirectory has this invariant: only a single instance of FSDirectory exists per canonical directory in the filesystem. This allows code to synchronized on that instance and sure no other code in the same JVM is also working in that canonical directory. When you close an FSDirectory but keep using it you can get yourself to a point where this invariant is broken. That said, besides IndexModifier (which is now deprecated), I can't find anything that would actually break when this invariant is broken. Still I think we should put protection in to catch double-closing and prevent using a closed directory. I'll open an issue. Mike Paul Taylor wrote: Michael McCandless wrote: Hmmm, you should not close the directory if you are then going to use it to instantiate a searcher. how come it works ? Your code below never closes the searcher? I think that is most likely the source of your file descriptor leaks. Ok fixed paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Readers synchronization
No, that's not changed. You must still reopen an IndexReader to see changes to the index. An IndexReader always searches a point-in-time snapshot of the index. LUCENE-1044 does mean that you should call IndexWriter.commit() (or, close the writer) to ensure all changes you've made become visible to the reader. Mike Eric Diaz wrote: According to SVN history on the next version this will be available: LUCENE-1044: IndexWriter with autoCommit=true now commits (such that a reader can see the changes) far less often than it used to. Previously, every flush was also a commit. You can always force a commit by calling IndexWriter.commit(). Furthermore, in 3.0, autoCommit will be hardwired to false (IndexWriter constructors that take an autoCommit argument have been deprecated) (Mike McCandless) Does this mean that I won't need to reopen all the readers in order to see the index changes? Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Move from RAMDirectory to FSDirectory causing problem sometimes
OK I opened: https://issues.apache.org/jira/browse/LUCENE-1331 Mike Paul Taylor wrote: Michael McCandless wrote: Hmmm, you should not close the directory if you are then going to use it to instantiate a searcher. how come it works ? Your code below never closes the searcher? I think that is most likely the source of your file descriptor leaks. Ok fixed paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Readers synchronization
Besides the warm up that the faq section suggests (used on solr), is there another technique or solution to have an IndexReader/Search with an updated view of an index under a concurrent scenario (web app)? Thanks --- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote: > From: Michael McCandless <[EMAIL PROTECTED]> > Subject: Re: Readers synchronization > To: java-user@lucene.apache.org > Date: Tuesday, July 8, 2008, 11:12 AM > No, that's not changed. You must still reopen an > IndexReader to see > changes to the index. An IndexReader always searches a > point-in-time > snapshot of the index. > > LUCENE-1044 does mean that you should call > IndexWriter.commit() (or, > close the writer) to ensure all changes you've made > become visible to > the reader. > > Mike > > Eric Diaz wrote: > > > According to SVN history on the next version this will > be available: > > > > LUCENE-1044: IndexWriter with autoCommit=true now > commits (such > >that a reader can see the changes) far less often > than it used to. > >Previously, every flush was also a commit. You can > always force a > >commit by calling IndexWriter.commit(). > Furthermore, in 3.0, > >autoCommit will be hardwired to false (IndexWriter > constructors > >that take an autoCommit argument have been > deprecated) (Mike > >McCandless) > > > > Does this mean that I won't need to reopen all the > readers in order > > to see the index changes? > > > > Thanks > > > > > > > > > > > - > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Readers synchronization
No other techniques that I know of... But there is ongoing discussions/work towards making reopening a reader much less costly. EG repopulating the field cache after reopen is a costly operation now, but this issue: https://issues.apache.org/jira/browse/LUCENE-1231 would make that cost be proportional to the number & size of the changed segments since you last reopened. There has also been discussions on creating an IndexReader implementation that can directly search the RAM buffer in IndexWriter, which should give very fast turnaround in searching just-indexed documents, but that is quite a ways off... Mike Eric Diaz wrote: Besides the warm up that the faq section suggests (used on solr), is there another technique or solution to have an IndexReader/Search with an updated view of an index under a concurrent scenario (web app)? Thanks --- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote: From: Michael McCandless <[EMAIL PROTECTED]> Subject: Re: Readers synchronization To: java-user@lucene.apache.org Date: Tuesday, July 8, 2008, 11:12 AM No, that's not changed. You must still reopen an IndexReader to see changes to the index. An IndexReader always searches a point-in-time snapshot of the index. LUCENE-1044 does mean that you should call IndexWriter.commit() (or, close the writer) to ensure all changes you've made become visible to the reader. Mike Eric Diaz wrote: According to SVN history on the next version this will be available: LUCENE-1044: IndexWriter with autoCommit=true now commits (such that a reader can see the changes) far less often than it used to. Previously, every flush was also a commit. You can always force a commit by calling IndexWriter.commit(). Furthermore, in 3.0, autoCommit will be hardwired to false (IndexWriter constructors that take an autoCommit argument have been deprecated) (Mike McCandless) Does this mean that I won't need to reopen all the readers in order to see the index changes? Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Readers synchronization
Is there any plan to change this behavior? meaning that by default a reader will see the current index? Thanks in advance --- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote: > From: Michael McCandless <[EMAIL PROTECTED]> > Subject: Re: Readers synchronization > To: java-user@lucene.apache.org, [EMAIL PROTECTED] > Date: Tuesday, July 8, 2008, 11:58 AM > No other techniques that I know of... > > But there is ongoing discussions/work towards making > reopening a > reader much less costly. EG repopulating the field cache > after reopen > is a costly operation now, but this issue: > > https://issues.apache.org/jira/browse/LUCENE-1231 > > would make that cost be proportional to the number & > size of the > changed segments since you last reopened. > > There has also been discussions on creating an IndexReader > > implementation that can directly search the RAM buffer in > IndexWriter, > which should give very fast turnaround in searching > just-indexed > documents, but that is quite a ways off... > > Mike > > Eric Diaz wrote: > > > Besides the warm up that the faq section suggests > (used on solr), is > > there another technique or solution to have an > IndexReader/Search > > with an updated view of an index under a concurrent > scenario (web > > app)? > > > > Thanks > > > > --- On Tue, 7/8/08, Michael McCandless > <[EMAIL PROTECTED]> > > wrote: > > > >> From: Michael McCandless > <[EMAIL PROTECTED]> > >> Subject: Re: Readers synchronization > >> To: java-user@lucene.apache.org > >> Date: Tuesday, July 8, 2008, 11:12 AM > >> No, that's not changed. You must still reopen > an > >> IndexReader to see > >> changes to the index. An IndexReader always > searches a > >> point-in-time > >> snapshot of the index. > >> > >> LUCENE-1044 does mean that you should call > >> IndexWriter.commit() (or, > >> close the writer) to ensure all changes you've > made > >> become visible to > >> the reader. > >> > >> Mike > >> > >> Eric Diaz wrote: > >> > >>> According to SVN history on the next version > this will > >> be available: > >>> > >>> LUCENE-1044: IndexWriter with autoCommit=true > now > >> commits (such > >>> that a reader can see the changes) far less > often > >> than it used to. > >>> Previously, every flush was also a commit. > You can > >> always force a > >>> commit by calling IndexWriter.commit(). > >> Furthermore, in 3.0, > >>> autoCommit will be hardwired to false > (IndexWriter > >> constructors > >>> that take an autoCommit argument have been > >> deprecated) (Mike > >>> McCandless) > >>> > >>> Does this mean that I won't need to reopen > all the > >> readers in order > >>> to see the index changes? > >>> > >>> Thanks > >>> > >>> > >>> > >>> > >>> > >> > - > >>> To unsubscribe, e-mail: > >> [EMAIL PROTECTED] > >>> For additional commands, e-mail: > >> [EMAIL PROTECTED] > >>> > >> > >> > >> > - > >> To unsubscribe, e-mail: > >> [EMAIL PROTECTED] > >> For additional commands, e-mail: > >> [EMAIL PROTECTED] > > > > > > > > > > > - > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Readers synchronization
Not that I know of. Mike Eric Diaz wrote: Is there any plan to change this behavior? meaning that by default a reader will see the current index? Thanks in advance --- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote: From: Michael McCandless <[EMAIL PROTECTED]> Subject: Re: Readers synchronization To: java-user@lucene.apache.org, [EMAIL PROTECTED] Date: Tuesday, July 8, 2008, 11:58 AM No other techniques that I know of... But there is ongoing discussions/work towards making reopening a reader much less costly. EG repopulating the field cache after reopen is a costly operation now, but this issue: https://issues.apache.org/jira/browse/LUCENE-1231 would make that cost be proportional to the number & size of the changed segments since you last reopened. There has also been discussions on creating an IndexReader implementation that can directly search the RAM buffer in IndexWriter, which should give very fast turnaround in searching just-indexed documents, but that is quite a ways off... Mike Eric Diaz wrote: Besides the warm up that the faq section suggests (used on solr), is there another technique or solution to have an IndexReader/Search with an updated view of an index under a concurrent scenario (web app)? Thanks --- On Tue, 7/8/08, Michael McCandless <[EMAIL PROTECTED]> wrote: From: Michael McCandless <[EMAIL PROTECTED]> Subject: Re: Readers synchronization To: java-user@lucene.apache.org Date: Tuesday, July 8, 2008, 11:12 AM No, that's not changed. You must still reopen an IndexReader to see changes to the index. An IndexReader always searches a point-in-time snapshot of the index. LUCENE-1044 does mean that you should call IndexWriter.commit() (or, close the writer) to ensure all changes you've made become visible to the reader. Mike Eric Diaz wrote: According to SVN history on the next version this will be available: LUCENE-1044: IndexWriter with autoCommit=true now commits (such that a reader can see the changes) far less often than it used to. Previously, every flush was also a commit. You can always force a commit by calling IndexWriter.commit(). Furthermore, in 3.0, autoCommit will be hardwired to false (IndexWriter constructors that take an autoCommit argument have been deprecated) (Mike McCandless) Does this mean that I won't need to reopen all the readers in order to see the index changes? Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to handle frequent updates.
Hi there, I know lucene is for indexing and not for frequent updates and delete. But i have been using lucene to store my matrix as a document. Since with my algorithm the value of matrix can change so i am updating the value. But for this i have to close and reopen indexReader and in additional to that the reader is not able to read the documents hold in the RAM directory or buffer in indexWriter... i.e. documents that are hold in memory by indexwriter due to other parameters set for indexwriter so eventually i have to optimize or flush the writer and reopen the reader to get accurate results. Is there some work around for this type of job? Can any one suggest me any other open source API ? Thank You miztaken -- View this message in context: http://www.nabble.com/How-to-handle-frequent-updates.-tp18347238p18347238.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
boolean query or
Hello, Is it possible to make a boolean query where a word is equal to fieldA or fieldB? in other words, I like to search a word in two fields, if word passes in fieldA or fieldB, then it is a hit. Best, -C.B.