Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes
If you are not using NRT readers then the applyAllDeletes/writeAllDeletes boolean values are completely unused (and should have no impact on your performance). Mike McCandless http://blog.mikemccandless.com On Sun, May 28, 2017 at 8:34 PM, Nawab Zada Asad Iqbalwrote: > After reading some more code it seems if we are sure that there are no > deletes in this segment/index, then setting applyAllDeletes and > writeAllDeletes both to false will achieve similar to what I was getting in > 4.5.0 > > However, after I read the comment from IndexWriter::DirectoryReader > getReader(boolean applyAllDeletes, boolean writeAllDeletes) , it seems that > this method is particular to NRT. Since we are not using soft commits, can > this change actually improve our performance during full reindex? > > > Thanks > Nawab > > > > > > > > > > On Sun, May 28, 2017 at 2:16 PM, Nawab Zada Asad Iqbal > wrote: > >> Thanks Michael and Shawn for the detailed response. I was later able to >> pull the full history using gitk; and found the commits behind this patch. >> >> Mike: >> >> So, in solr 4.5.0 ; some earlier developer has added code and config to >> set applyAllDeletes to false when we reindex all the data. At the moment, >> I am not sure about the performance gain by this. >> >> >> >> >> I am investigating the question, if this change is still needed in 6.5.1 >> or can this be achieved by any other configuration? >> >> For now, we are not planning to use NRT and solrCloud. >> >> >> Thanks >> Nawab >> >> On Sun, May 28, 2017 at 9:26 AM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >>> Sorry, yes, that commit was one of many on a feature branch I used to >>> work on LUCENE-5438, which added near-real-time index replication to >>> Lucene. Before this change, Lucene's replication module required a commit >>> in order to replicate, which is a heavy operation. >>> >>> The writeAllDeletes boolean option asks Lucene to move all recent >>> deletes (tombstone bitsets) to disk while opening the NRT (near-real-time) >>> reader. >>> >>> Normally Lucene won't always do that, and will instead carry the bitsets >>> in memory from writer to reader, for reduced refresh latency. >>> >>> What sort of custom changes do you have in this part of Lucene? >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal < >>> khi...@gmail.com> wrote: >>> Hi all I am looking at following change in lucene-solr which doen't mention any JIRA. How can I know more about it? "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch" Specifically, I am interested in what 'writeAllDeletes' does in the following method. Let me know if it is very stupid question and I should have done something else before emailing here. static DirectoryReader open(IndexWriter writer, SegmentInfos infos, boolean applyAllDeletes, boolean writeAllDeletes) throws IOException { Background: We are running solr4.5 and upgrading to 6.5.1. We have some custom code in this area, which we need to merge. Thanks Nawab >>> >>> >> >
Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes
After reading some more code it seems if we are sure that there are no deletes in this segment/index, then setting applyAllDeletes and writeAllDeletes both to false will achieve similar to what I was getting in 4.5.0 However, after I read the comment from IndexWriter::DirectoryReader getReader(boolean applyAllDeletes, boolean writeAllDeletes) , it seems that this method is particular to NRT. Since we are not using soft commits, can this change actually improve our performance during full reindex? Thanks Nawab On Sun, May 28, 2017 at 2:16 PM, Nawab Zada Asad Iqbalwrote: > Thanks Michael and Shawn for the detailed response. I was later able to > pull the full history using gitk; and found the commits behind this patch. > > Mike: > > So, in solr 4.5.0 ; some earlier developer has added code and config to > set applyAllDeletes to false when we reindex all the data. At the moment, > I am not sure about the performance gain by this. > > > > > I am investigating the question, if this change is still needed in 6.5.1 > or can this be achieved by any other configuration? > > For now, we are not planning to use NRT and solrCloud. > > > Thanks > Nawab > > On Sun, May 28, 2017 at 9:26 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Sorry, yes, that commit was one of many on a feature branch I used to >> work on LUCENE-5438, which added near-real-time index replication to >> Lucene. Before this change, Lucene's replication module required a commit >> in order to replicate, which is a heavy operation. >> >> The writeAllDeletes boolean option asks Lucene to move all recent deletes >> (tombstone bitsets) to disk while opening the NRT (near-real-time) reader. >> >> Normally Lucene won't always do that, and will instead carry the bitsets >> in memory from writer to reader, for reduced refresh latency. >> >> What sort of custom changes do you have in this part of Lucene? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal > > wrote: >> >>> Hi all >>> >>> I am looking at following change in lucene-solr which doen't mention any >>> JIRA. How can I know more about it? >>> >>> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch" >>> >>> Specifically, I am interested in what 'writeAllDeletes' does in the >>> following method. Let me know if it is very stupid question and I should >>> have done something else before emailing here. >>> >>> static DirectoryReader open(IndexWriter writer, SegmentInfos infos, >>> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException { >>> >>> Background: We are running solr4.5 and upgrading to 6.5.1. We have >>> some custom code in this area, which we need to merge. >>> >>> >>> Thanks >>> >>> Nawab >>> >> >> >
Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes
Thanks Michael and Shawn for the detailed response. I was later able to pull the full history using gitk; and found the commits behind this patch. Mike: So, in solr 4.5.0 ; some earlier developer has added code and config to set applyAllDeletes to false when we reindex all the data. At the moment, I am not sure about the performance gain by this. I am investigating the question, if this change is still needed in 6.5.1 or can this be achieved by any other configuration? For now, we are not planning to use NRT and solrCloud. Thanks Nawab On Sun, May 28, 2017 at 9:26 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Sorry, yes, that commit was one of many on a feature branch I used to work > on LUCENE-5438, which added near-real-time index replication to Lucene. > Before this change, Lucene's replication module required a commit in order > to replicate, which is a heavy operation. > > The writeAllDeletes boolean option asks Lucene to move all recent deletes > (tombstone bitsets) to disk while opening the NRT (near-real-time) reader. > > Normally Lucene won't always do that, and will instead carry the bitsets > in memory from writer to reader, for reduced refresh latency. > > What sort of custom changes do you have in this part of Lucene? > > Mike McCandless > > http://blog.mikemccandless.com > > On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal> wrote: > >> Hi all >> >> I am looking at following change in lucene-solr which doen't mention any >> JIRA. How can I know more about it? >> >> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch" >> >> Specifically, I am interested in what 'writeAllDeletes' does in the >> following method. Let me know if it is very stupid question and I should >> have done something else before emailing here. >> >> static DirectoryReader open(IndexWriter writer, SegmentInfos infos, >> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException { >> >> Background: We are running solr4.5 and upgrading to 6.5.1. We have >> some custom code in this area, which we need to merge. >> >> >> Thanks >> >> Nawab >> > >
Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes
Sorry, yes, that commit was one of many on a feature branch I used to work on LUCENE-5438, which added near-real-time index replication to Lucene. Before this change, Lucene's replication module required a commit in order to replicate, which is a heavy operation. The writeAllDeletes boolean option asks Lucene to move all recent deletes (tombstone bitsets) to disk while opening the NRT (near-real-time) reader. Normally Lucene won't always do that, and will instead carry the bitsets in memory from writer to reader, for reduced refresh latency. What sort of custom changes do you have in this part of Lucene? Mike McCandless http://blog.mikemccandless.com On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbalwrote: > Hi all > > I am looking at following change in lucene-solr which doen't mention any > JIRA. How can I know more about it? > > "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch" > > Specifically, I am interested in what 'writeAllDeletes' does in the > following method. Let me know if it is very stupid question and I should > have done something else before emailing here. > > static DirectoryReader open(IndexWriter writer, SegmentInfos infos, > boolean applyAllDeletes, boolean writeAllDeletes) throws IOException { > > Background: We are running solr4.5 and upgrading to 6.5.1. We have > some custom code in this area, which we need to merge. > > > Thanks > > Nawab >
Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes
On 5/27/2017 8:35 PM, Nawab Zada Asad Iqbal wrote: > I am looking at following change in lucene-solr which doen't mention any > JIRA. How can I know more about it? > > "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch" The reason that there's no Jira issue mentioned is that the commit was first made on an entirely separate git branch -- the one for LUCENE-5438. I think I would argue that even commits on special branches should have the Jira issue in the commit text, because eventually those commits get merged back to master, where information about the source branch seems to disappear. https://issues.apache.org/jira/browse/LUCENE-5438 LUCENE-5438 added an HTTP-based replication capability that Lucene users could leverage, similar to what Solr already had. > Specifically, I am interested in what 'writeAllDeletes' does in the > following method. Let me know if it is very stupid question and I should > have done something else before emailing here. > > static DirectoryReader open(IndexWriter writer, SegmentInfos infos, > boolean applyAllDeletes, boolean writeAllDeletes) throws IOException { > > Background: We are running solr4.5 and upgrading to 6.5.1. We have > some custom code in this area, which we need to merge. I found this commit in the archive for the commits mailing list. The full commit hash is 1ae7291429bad742715344f86cfa5200229b3698. https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=1ae72914 This was a change in Lucene code, Solr code wasn't touched. The author of that change, Mike McCandless, is a participant on this list. There are posts from him as recently as April 2017. Because of that, I hesitate to have you ask your question on the dev list, but if you don't get a useful reply from somebody soon, you may want to do that. I wish I could offer you some advice myself, but I'm not familiar with the low-level Lucene code. Thanks, Shawn