Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-29 Thread Michael McCandless
If you are not using NRT readers then the applyAllDeletes/writeAllDeletes
boolean values are completely unused (and should have no impact on your
performance).

Mike McCandless

http://blog.mikemccandless.com

On Sun, May 28, 2017 at 8:34 PM, Nawab Zada Asad Iqbal 
wrote:

> After reading some more code it seems if we are sure that there are no
> deletes in this segment/index, then setting  applyAllDeletes and
> writeAllDeletes both to false will achieve similar to what I was getting in
> 4.5.0
>
> However, after I read the comment from IndexWriter::DirectoryReader
> getReader(boolean applyAllDeletes, boolean writeAllDeletes) , it seems that
> this method is particular to NRT.  Since we are not using soft commits, can
> this change actually improve our performance during full reindex?
>
>
> Thanks
> Nawab
>
>
>
>
>
>
>
>
>
> On Sun, May 28, 2017 at 2:16 PM, Nawab Zada Asad Iqbal 
> wrote:
>
>> Thanks Michael and Shawn for the detailed response. I was later able to
>> pull the full history using gitk; and found the commits behind this patch.
>>
>> Mike:
>>
>> So, in solr 4.5.0 ; some earlier developer has added code and config to
>> set applyAllDeletes to false when we reindex all the data.  At the moment,
>> I am not sure about the performance gain by this.
>>
>> 
>>
>>
>> I am investigating the question, if this change is still needed in 6.5.1
>> or can this be achieved by any other configuration?
>>
>> For now, we are not planning to use NRT and solrCloud.
>>
>>
>> Thanks
>> Nawab
>>
>> On Sun, May 28, 2017 at 9:26 AM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> Sorry, yes, that commit was one of many on a feature branch I used to
>>> work on LUCENE-5438, which added near-real-time index replication to
>>> Lucene.  Before this change, Lucene's replication module required a commit
>>> in order to replicate, which is a heavy operation.
>>>
>>> The writeAllDeletes boolean option asks Lucene to move all recent
>>> deletes (tombstone bitsets) to disk while opening the NRT (near-real-time)
>>> reader.
>>>
>>> Normally Lucene won't always do that, and will instead carry the bitsets
>>> in memory from writer to reader, for reduced refresh latency.
>>>
>>> What sort of custom changes do you have in this part of Lucene?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal <
>>> khi...@gmail.com> wrote:
>>>
 Hi all

 I am looking at following change in lucene-solr which doen't mention any
 JIRA. How can I know more about it?

 "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"

 Specifically, I am interested in what 'writeAllDeletes'  does in the
 following method. Let me know if it is very stupid question and I should
 have done something else before emailing here.

 static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
 boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {

 Background: We are running solr4.5 and upgrading to 6.5.1. We have
 some custom code in this area, which we need to merge.


 Thanks

 Nawab

>>>
>>>
>>
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Nawab Zada Asad Iqbal
After reading some more code it seems if we are sure that there are no
deletes in this segment/index, then setting  applyAllDeletes and
writeAllDeletes both to false will achieve similar to what I was getting in
4.5.0

However, after I read the comment from IndexWriter::DirectoryReader
getReader(boolean applyAllDeletes, boolean writeAllDeletes) , it seems that
this method is particular to NRT.  Since we are not using soft commits, can
this change actually improve our performance during full reindex?


Thanks
Nawab









On Sun, May 28, 2017 at 2:16 PM, Nawab Zada Asad Iqbal 
wrote:

> Thanks Michael and Shawn for the detailed response. I was later able to
> pull the full history using gitk; and found the commits behind this patch.
>
> Mike:
>
> So, in solr 4.5.0 ; some earlier developer has added code and config to
> set applyAllDeletes to false when we reindex all the data.  At the moment,
> I am not sure about the performance gain by this.
>
> 
>
>
> I am investigating the question, if this change is still needed in 6.5.1
> or can this be achieved by any other configuration?
>
> For now, we are not planning to use NRT and solrCloud.
>
>
> Thanks
> Nawab
>
> On Sun, May 28, 2017 at 9:26 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Sorry, yes, that commit was one of many on a feature branch I used to
>> work on LUCENE-5438, which added near-real-time index replication to
>> Lucene.  Before this change, Lucene's replication module required a commit
>> in order to replicate, which is a heavy operation.
>>
>> The writeAllDeletes boolean option asks Lucene to move all recent deletes
>> (tombstone bitsets) to disk while opening the NRT (near-real-time) reader.
>>
>> Normally Lucene won't always do that, and will instead carry the bitsets
>> in memory from writer to reader, for reduced refresh latency.
>>
>> What sort of custom changes do you have in this part of Lucene?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal > > wrote:
>>
>>> Hi all
>>>
>>> I am looking at following change in lucene-solr which doen't mention any
>>> JIRA. How can I know more about it?
>>>
>>> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"
>>>
>>> Specifically, I am interested in what 'writeAllDeletes'  does in the
>>> following method. Let me know if it is very stupid question and I should
>>> have done something else before emailing here.
>>>
>>> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
>>> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>>>
>>> Background: We are running solr4.5 and upgrading to 6.5.1. We have
>>> some custom code in this area, which we need to merge.
>>>
>>>
>>> Thanks
>>>
>>> Nawab
>>>
>>
>>
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Nawab Zada Asad Iqbal
Thanks Michael and Shawn for the detailed response. I was later able to
pull the full history using gitk; and found the commits behind this patch.

Mike:

So, in solr 4.5.0 ; some earlier developer has added code and config to set
applyAllDeletes to false when we reindex all the data.  At the moment, I am
not sure about the performance gain by this.




I am investigating the question, if this change is still needed in 6.5.1 or
can this be achieved by any other configuration?

For now, we are not planning to use NRT and solrCloud.


Thanks
Nawab

On Sun, May 28, 2017 at 9:26 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Sorry, yes, that commit was one of many on a feature branch I used to work
> on LUCENE-5438, which added near-real-time index replication to Lucene.
> Before this change, Lucene's replication module required a commit in order
> to replicate, which is a heavy operation.
>
> The writeAllDeletes boolean option asks Lucene to move all recent deletes
> (tombstone bitsets) to disk while opening the NRT (near-real-time) reader.
>
> Normally Lucene won't always do that, and will instead carry the bitsets
> in memory from writer to reader, for reduced refresh latency.
>
> What sort of custom changes do you have in this part of Lucene?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal 
> wrote:
>
>> Hi all
>>
>> I am looking at following change in lucene-solr which doen't mention any
>> JIRA. How can I know more about it?
>>
>> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"
>>
>> Specifically, I am interested in what 'writeAllDeletes'  does in the
>> following method. Let me know if it is very stupid question and I should
>> have done something else before emailing here.
>>
>> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
>> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>>
>> Background: We are running solr4.5 and upgrading to 6.5.1. We have
>> some custom code in this area, which we need to merge.
>>
>>
>> Thanks
>>
>> Nawab
>>
>
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Michael McCandless
Sorry, yes, that commit was one of many on a feature branch I used to work
on LUCENE-5438, which added near-real-time index replication to Lucene.
Before this change, Lucene's replication module required a commit in order
to replicate, which is a heavy operation.

The writeAllDeletes boolean option asks Lucene to move all recent deletes
(tombstone bitsets) to disk while opening the NRT (near-real-time) reader.

Normally Lucene won't always do that, and will instead carry the bitsets in
memory from writer to reader, for reduced refresh latency.

What sort of custom changes do you have in this part of Lucene?

Mike McCandless

http://blog.mikemccandless.com

On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal 
wrote:

> Hi all
>
> I am looking at following change in lucene-solr which doen't mention any
> JIRA. How can I know more about it?
>
> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"
>
> Specifically, I am interested in what 'writeAllDeletes'  does in the
> following method. Let me know if it is very stupid question and I should
> have done something else before emailing here.
>
> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>
> Background: We are running solr4.5 and upgrading to 6.5.1. We have
> some custom code in this area, which we need to merge.
>
>
> Thanks
>
> Nawab
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-28 Thread Shawn Heisey
On 5/27/2017 8:35 PM, Nawab Zada Asad Iqbal wrote:
> I am looking at following change in lucene-solr which doen't mention any
> JIRA. How can I know more about it?
>
> "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"

The reason that there's no Jira issue mentioned is that the commit was
first made on an entirely separate git branch -- the one for
LUCENE-5438.  I think I would argue that even commits on special
branches should have the Jira issue in the commit text, because
eventually those commits get merged back to master, where information
about the source branch seems to disappear.

https://issues.apache.org/jira/browse/LUCENE-5438

LUCENE-5438 added an HTTP-based replication capability that Lucene users
could leverage, similar to what Solr already had.

> Specifically, I am interested in what 'writeAllDeletes'  does in the
> following method. Let me know if it is very stupid question and I should
> have done something else before emailing here.
>
> static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
> boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {
>
> Background: We are running solr4.5 and upgrading to 6.5.1. We have
> some custom code in this area, which we need to merge.

I found this commit in the archive for the commits mailing list.  The
full commit hash is 1ae7291429bad742715344f86cfa5200229b3698.

https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=1ae72914

This was a change in Lucene code, Solr code wasn't touched.  The author
of that change, Mike McCandless, is a participant on this list.  There
are posts from him as recently as April 2017.  Because of that, I
hesitate to have you ask your question on the dev list, but if you don't
get a useful reply from somebody soon, you may want to do that.

I wish I could offer you some advice myself, but I'm not familiar with
the low-level Lucene code.

Thanks,
Shawn