Request times metric for collection

2021-03-15 Thread Alex Bulygin

Good afternoon everyone! Someone can tell me please, metric request time can be 
taken only by cores? Are there any aggregates on the collection or in the 
collection + host? 
Are there any open best practices for situations where there are many cores? 
--
Alex Bulygin

Re: Lucene (unexpected ) fsync on existing segments

2021-03-15 Thread Rahul Goswami
Uwe,
I understand that mmap would only map *a part* of the index from virtual
address space to physical memory as and when the pages are requested.
However the limitation on our side is that in most cases, we cannot ask for
more than 128 GB RAM (and unfortunately even that would be a stretch) for
the Solr machine.

I have read and re-read the article you referenced in the past :) It's
brilliantly written and did help clarify quite a few things for me I must
say. However, at the end of the day, there is only so much the OS (at least
Windows) can do before it starts to swap different pages in a 2-3 TB index
into 64 GB of physical space, isn't that right ? The CPU usage spikes to
100% at such times and the machine becomes totally unresponsive. Turning on
SimpleFSDIrectory at such times does rid us of this issue. I understand
that we are losing out on performance by an order of magnitude compared to
mmap, but I don't know any alternate solution. Also, since most of our use
cases are more write-heavy than read-heavy, we can afford to compromise on
the search performance due to SimpleFS.

Please let me know still, if there is anything about my explanation that
doesn't sound right to you.

Thanks,
Rahul

On Mon, Mar 15, 2021 at 3:54 PM Uwe Schindler  wrote:

> This is not true. Memory mapping does not need to load the index into ram,
> so you don't need so much physical memory. Paging is done only between
> index files and ram, that's what memory mapping is about.
>
> Please read the blog post:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Uwe
>
> Am March 15, 2021 7:43:29 PM UTC schrieb Rahul Goswami <
> rahul196...@gmail.com>:
>>
>> Mike,
>> Yes I am using a 64 bit JVM on Windows. I haven't tried reproducing the
>> issue on Linux yet. In the past we have had problems with mmap on Windows
>> with the machine freezing. The rationale I gave to myself is the amount of
>> disk and CPU activity for paging in and out must be intense for the OS
>> while trying to map an index that large into 64 GB of heap. Also since it's
>> an on-premise deployment, we can't expect the customers of the product to
>> provide nodes with > 400 GB RAM which is what *I think* would be required
>> to get a decent performance with mmap. Hence we had to switch to
>> SimpleFSDirectory.
>>
>> As for the fsync behavior, you are right. I tried with
>> NRTCachingDirectoryFactory as well which defaults to using mmap underneath
>> and still makes fsync calls for already existing index files.
>>
>> Thanks,
>> Rahul
>>
>> On Mon, Mar 15, 2021 at 3:15 PM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> Thanks Rahul.
>>>
>>> > primary reason being that memory mapping multi-terabyte indexes is not
>>> feasible through mmap
>>>
>>> Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what
>>> goes wrong with such large maps?  Lucene's MMapDirectory should chunk the
>>> mapping to deal with ByteBuffer int only address space.
>>>
>>> SimpleFSDirectory usually has substantially worse performance than
>>> MMapDirectory.
>>>
>>> Still, I suspect you would hit the same issue if you used other
>>> FSDirectory implementations -- the fsync behavior should be the same.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami 
>>> wrote:
>>>
 Thanks Michael. For your question...yes I am running Solr on Windows
 and running it with SimpleFSDirectoryFactory (primary reason being that
 memory mapping multi-terabyte indexes is not feasible through mmap). I will
 create a Jira later today with the details in this thread and assign it to
 myself. Will take a shot at the fix.

 Thanks,
 Rahul

 On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
 luc...@mikemccandless.com> wrote:

> I think long ago we used to track which files were actually dirty (we
> had written bytes to) and only fsync those ones.  But something went wrong
> with that, and at some point we "simplified" this logic, I think on the
> assumption that asking the OS to fsync a file that does in fact exist yet
> indeed has not changed would be harmless?  But somehow it is not in your
> case?  Are you on Windows?
>
> I tried to do a bit of digital archaeology and remember what
> happened here, and I came across this relevant looking issue:
> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue moved
> tracking of which files have been written but not yet fsync'd down from
> IndexWriter into FSDirectory.
>
> But there was another change that then removed staleFiles from
> FSDirectory entirely still trying to find that.  Aha, found it!
> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was
> really quite upset in that issue ;)
>
> I also came across this delightful related issue, showing how a
> massive hurricane (Irene) can lead to 

Re: Lucene (unexpected ) fsync on existing segments

2021-03-15 Thread Uwe Schindler
This is not true. Memory mapping does not need to load the index into ram, so 
you don't need so much physical memory. Paging is done only between index files 
and ram, that's what memory mapping is about.

Please read the blog post: 
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe

Am March 15, 2021 7:43:29 PM UTC schrieb Rahul Goswami :
>Mike,
>Yes I am using a 64 bit JVM on Windows. I haven't tried reproducing the
>issue on Linux yet. In the past we have had problems with mmap on
>Windows
>with the machine freezing. The rationale I gave to myself is the amount
>of
>disk and CPU activity for paging in and out must be intense for the OS
>while trying to map an index that large into 64 GB of heap. Also since
>it's
>an on-premise deployment, we can't expect the customers of the product
>to
>provide nodes with > 400 GB RAM which is what *I think* would be
>required
>to get a decent performance with mmap. Hence we had to switch to
>SimpleFSDirectory.
>
>As for the fsync behavior, you are right. I tried with
>NRTCachingDirectoryFactory as well which defaults to using mmap
>underneath
>and still makes fsync calls for already existing index files.
>
>Thanks,
>Rahul
>
>On Mon, Mar 15, 2021 at 3:15 PM Michael McCandless <
>luc...@mikemccandless.com> wrote:
>
>> Thanks Rahul.
>>
>> > primary reason being that memory mapping multi-terabyte indexes is
>not
>> feasible through mmap
>>
>> Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what
>goes
>> wrong with such large maps?  Lucene's MMapDirectory should chunk the
>> mapping to deal with ByteBuffer int only address space.
>>
>> SimpleFSDirectory usually has substantially worse performance than
>> MMapDirectory.
>>
>> Still, I suspect you would hit the same issue if you used other
>> FSDirectory implementations -- the fsync behavior should be the same.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami 
>> wrote:
>>
>>> Thanks Michael. For your question...yes I am running Solr on Windows
>and
>>> running it with SimpleFSDirectoryFactory (primary reason being that
>memory
>>> mapping multi-terabyte indexes is not feasible through mmap). I will
>create
>>> a Jira later today with the details in this thread and assign it to
>myself.
>>> Will take a shot at the fix.
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
 I think long ago we used to track which files were actually dirty
>(we
 had written bytes to) and only fsync those ones.  But something
>went wrong
 with that, and at some point we "simplified" this logic, I think on
>the
 assumption that asking the OS to fsync a file that does in fact
>exist yet
 indeed has not changed would be harmless?  But somehow it is not in
>your
 case?  Are you on Windows?

 I tried to do a bit of digital archaeology and remember what
 happened here, and I came across this relevant looking issue:
 https://issues.apache.org/jira/browse/LUCENE-2328.  That issue
>moved
 tracking of which files have been written but not yet fsync'd down
>from
 IndexWriter into FSDirectory.

 But there was another change that then removed staleFiles from
 FSDirectory entirely still trying to find that.  Aha, found it!
 https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was
>really
 quite upset in that issue ;)

 I also came across this delightful related issue, showing how a
>massive
 hurricane (Irene) can lead to finding and fixing a bug in Lucene!
 https://issues.apache.org/jira/browse/LUCENE-3418

 > The assumption is that while the commit point is saved, no
>changes
 happen to the segment files in the saved generation.

 This assumption should really be true.  Lucene writes the files,
>append
 only, once, and then never changes them, once they are closed. 
>Pulling a
 commit point from Solr should further ensure that, even as indexing
 continues and new segments are written, the old segments referenced
>in that
 commit point will not be deleted.  But apparently this "harmless
>fsync"
 Lucene is doing is not so harmless in your use case.  Maybe open an
>issue
 and pull out the details from this discussion onto it?

 Mike McCandless

 http://blog.mikemccandless.com


 On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov
>
 wrote:

> Also - I should have said - I think the first step here is to
>write a
> focused unit test that demonstrates the existence of the extra
>fsyncs
> that we want to eliminate. It would be awesome if you were able to
> create such a thing.
>
> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov
>
> wrote:
> >
> > Yes, please go ahead and open an issue. TBH I'm not sure why
>this is
> > happening - there may be a good reason?? But let's 

PFOR for docids?

2021-03-15 Thread Greg Miller
Hi folks-

I'm curious to understand the history/context of using PFOR for positions
and frequencies while continuing to use basic FOR for docid encoding. I've
done my best to turn up any past conversations on this, but wasn't able to
find much. Apologies if I missed it in my digging! From what I've gathered,
the basic FOR encoding was introduced to Lucene with LUCENE-3892
 (which was a
continuation of LUCENE-1410
). While PFOR had been
discussed plenty in the earlier issues, I gather that it wasn't actually
committed until LUCENE-9027
. Hopefully I've got
that much right. And it appears at that time to have been introduced for
positions and frequencies, but not docids.

Is the reasoning here that, a) since docids are delta-encoded already,
outliers/exceptions will be less likely/beneficial, and b) FOR allows for
an optimization in decoding the deltas (via. ForUtil#decodeAndPrefixSum)
which can't be utilized with PFOR, since the exceptions must be patched in
before decoding deltas? Are the other reasons FOR continues to be used for
docids that I'm overlooking?

I'm curious as I recently ran some internal benchmarks on the Amazon
product search engine replacing FOR with PFOR for docids delta encoding,
and saw an index size reduction of -0.93% while also improving our red-line
queries/sec by +1.0%. I expected the index size reduction but wasn't
expecting to see a QPS improvement, which I haven't yet been able to
explain. I'm wondering if there are some good reasons to keep using FOR for
docids, or if there'd be any appetite to discuss using PFOR for everything?
Again, apologies if I've overlooked some past discussion in my digging. Any
history/context is much appreciated!

Cheers,
-Greg


Re: Lucene (unexpected ) fsync on existing segments

2021-03-15 Thread Uwe Schindler
Correction: the windows limitation is only till windows server 2012 / Windows 
8. So you can memory map easily terabytes of data nowadays.

Uwe

Am March 15, 2021 7:42:26 PM UTC schrieb Uwe Schindler :
>Hi Mike,
>
>Windows has unfortunately some crazy limitation on address space, so
>number of address bits is limited to 43, see my blog post @
>https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
>That's 8 Terabyte.
>
>On Linux this limitation is at 47 bits, and with later kernels and
>hardware it's even huger so the universe fits into it. 
>
>Uwe
>
>Am March 15, 2021 7:15:11 PM UTC schrieb Michael McCandless
>:
>>Thanks Rahul.
>>
>>> primary reason being that memory mapping multi-terabyte indexes is
>>not
>>feasible through mmap
>>
>>Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what
>>goes
>>wrong with such large maps?  Lucene's MMapDirectory should chunk the
>>mapping to deal with ByteBuffer int only address space.
>>
>>SimpleFSDirectory usually has substantially worse performance than
>>MMapDirectory.
>>
>>Still, I suspect you would hit the same issue if you used other
>>FSDirectory
>>implementations -- the fsync behavior should be the same.
>>
>>Mike McCandless
>>
>>http://blog.mikemccandless.com
>>
>>
>>On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami 
>>wrote:
>>
>>> Thanks Michael. For your question...yes I am running Solr on Windows
>>and
>>> running it with SimpleFSDirectoryFactory (primary reason being that
>>memory
>>> mapping multi-terabyte indexes is not feasible through mmap). I will
>>create
>>> a Jira later today with the details in this thread and assign it to
>>myself.
>>> Will take a shot at the fix.
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
 I think long ago we used to track which files were actually dirty
>>(we had
 written bytes to) and only fsync those ones.  But something went
>>wrong with
 that, and at some point we "simplified" this logic, I think on the
 assumption that asking the OS to fsync a file that does in fact
>>exist yet
 indeed has not changed would be harmless?  But somehow it is not in
>>your
 case?  Are you on Windows?

 I tried to do a bit of digital archaeology and remember what
 happened here, and I came across this relevant looking issue:
 https://issues.apache.org/jira/browse/LUCENE-2328.  That issue
>moved
 tracking of which files have been written but not yet fsync'd down
>>from
 IndexWriter into FSDirectory.

 But there was another change that then removed staleFiles from
 FSDirectory entirely still trying to find that.  Aha, found it!
 https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was
>>really
 quite upset in that issue ;)

 I also came across this delightful related issue, showing how a
>>massive
 hurricane (Irene) can lead to finding and fixing a bug in Lucene!
 https://issues.apache.org/jira/browse/LUCENE-3418

 > The assumption is that while the commit point is saved, no
>changes
 happen to the segment files in the saved generation.

 This assumption should really be true.  Lucene writes the files,
>>append
 only, once, and then never changes them, once they are closed. 
>>Pulling a
 commit point from Solr should further ensure that, even as indexing
 continues and new segments are written, the old segments referenced
>>in that
 commit point will not be deleted.  But apparently this "harmless
>>fsync"
 Lucene is doing is not so harmless in your use case.  Maybe open an
>>issue
 and pull out the details from this discussion onto it?

 Mike McCandless

 http://blog.mikemccandless.com


 On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov
>
 wrote:

> Also - I should have said - I think the first step here is to
>write
>>a
> focused unit test that demonstrates the existence of the extra
>>fsyncs
> that we want to eliminate. It would be awesome if you were able to
> create such a thing.
>
> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov
>>
> wrote:
> >
> > Yes, please go ahead and open an issue. TBH I'm not sure why
>this
>>is
> > happening - there may be a good reason?? But let's explore it
>>using an
> > issue, thanks.
> >
> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami
>>
> wrote:
> > >
> > > I can create a Jira and assign it to myself if that's ok (?).
>I
> think this can help improve commit performance.
> > > Also, to answer your question, we have indexes sometimes going
>>into
> multiple terabytes. Using the replication handler for backup would
>>mean
> requiring a disk capacity more than 2x the index size on the
>>machine at all
> times, which might not be feasible. So we directly back the index
>>up from
> the Solr node to a remote repository.
> > >
> > 

Re: Lucene (unexpected ) fsync on existing segments

2021-03-15 Thread Rahul Goswami
Mike,
Yes I am using a 64 bit JVM on Windows. I haven't tried reproducing the
issue on Linux yet. In the past we have had problems with mmap on Windows
with the machine freezing. The rationale I gave to myself is the amount of
disk and CPU activity for paging in and out must be intense for the OS
while trying to map an index that large into 64 GB of heap. Also since it's
an on-premise deployment, we can't expect the customers of the product to
provide nodes with > 400 GB RAM which is what *I think* would be required
to get a decent performance with mmap. Hence we had to switch to
SimpleFSDirectory.

As for the fsync behavior, you are right. I tried with
NRTCachingDirectoryFactory as well which defaults to using mmap underneath
and still makes fsync calls for already existing index files.

Thanks,
Rahul

On Mon, Mar 15, 2021 at 3:15 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Thanks Rahul.
>
> > primary reason being that memory mapping multi-terabyte indexes is not
> feasible through mmap
>
> Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what goes
> wrong with such large maps?  Lucene's MMapDirectory should chunk the
> mapping to deal with ByteBuffer int only address space.
>
> SimpleFSDirectory usually has substantially worse performance than
> MMapDirectory.
>
> Still, I suspect you would hit the same issue if you used other
> FSDirectory implementations -- the fsync behavior should be the same.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami 
> wrote:
>
>> Thanks Michael. For your question...yes I am running Solr on Windows and
>> running it with SimpleFSDirectoryFactory (primary reason being that memory
>> mapping multi-terabyte indexes is not feasible through mmap). I will create
>> a Jira later today with the details in this thread and assign it to myself.
>> Will take a shot at the fix.
>>
>> Thanks,
>> Rahul
>>
>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> I think long ago we used to track which files were actually dirty (we
>>> had written bytes to) and only fsync those ones.  But something went wrong
>>> with that, and at some point we "simplified" this logic, I think on the
>>> assumption that asking the OS to fsync a file that does in fact exist yet
>>> indeed has not changed would be harmless?  But somehow it is not in your
>>> case?  Are you on Windows?
>>>
>>> I tried to do a bit of digital archaeology and remember what
>>> happened here, and I came across this relevant looking issue:
>>> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue moved
>>> tracking of which files have been written but not yet fsync'd down from
>>> IndexWriter into FSDirectory.
>>>
>>> But there was another change that then removed staleFiles from
>>> FSDirectory entirely still trying to find that.  Aha, found it!
>>> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was really
>>> quite upset in that issue ;)
>>>
>>> I also came across this delightful related issue, showing how a massive
>>> hurricane (Irene) can lead to finding and fixing a bug in Lucene!
>>> https://issues.apache.org/jira/browse/LUCENE-3418
>>>
>>> > The assumption is that while the commit point is saved, no changes
>>> happen to the segment files in the saved generation.
>>>
>>> This assumption should really be true.  Lucene writes the files, append
>>> only, once, and then never changes them, once they are closed.  Pulling a
>>> commit point from Solr should further ensure that, even as indexing
>>> continues and new segments are written, the old segments referenced in that
>>> commit point will not be deleted.  But apparently this "harmless fsync"
>>> Lucene is doing is not so harmless in your use case.  Maybe open an issue
>>> and pull out the details from this discussion onto it?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov 
>>> wrote:
>>>
 Also - I should have said - I think the first step here is to write a
 focused unit test that demonstrates the existence of the extra fsyncs
 that we want to eliminate. It would be awesome if you were able to
 create such a thing.

 On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov 
 wrote:
 >
 > Yes, please go ahead and open an issue. TBH I'm not sure why this is
 > happening - there may be a good reason?? But let's explore it using an
 > issue, thanks.
 >
 > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami 
 wrote:
 > >
 > > I can create a Jira and assign it to myself if that's ok (?). I
 think this can help improve commit performance.
 > > Also, to answer your question, we have indexes sometimes going into
 multiple terabytes. Using the replication handler for backup would mean
 requiring a disk capacity more than 2x the index size on the machine at all
 times, which might 

Re: Lucene (unexpected ) fsync on existing segments

2021-03-15 Thread Uwe Schindler
Hi Mike,

Windows has unfortunately some crazy limitation on address space, so number of 
address bits is limited to 43, see my blog post @ 
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

That's 8 Terabyte.

On Linux this limitation is at 47 bits, and with later kernels and hardware 
it's even huger so the universe fits into it. 

Uwe

Am March 15, 2021 7:15:11 PM UTC schrieb Michael McCandless 
:
>Thanks Rahul.
>
>> primary reason being that memory mapping multi-terabyte indexes is
>not
>feasible through mmap
>
>Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what
>goes
>wrong with such large maps?  Lucene's MMapDirectory should chunk the
>mapping to deal with ByteBuffer int only address space.
>
>SimpleFSDirectory usually has substantially worse performance than
>MMapDirectory.
>
>Still, I suspect you would hit the same issue if you used other
>FSDirectory
>implementations -- the fsync behavior should be the same.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>
>On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami 
>wrote:
>
>> Thanks Michael. For your question...yes I am running Solr on Windows
>and
>> running it with SimpleFSDirectoryFactory (primary reason being that
>memory
>> mapping multi-terabyte indexes is not feasible through mmap). I will
>create
>> a Jira later today with the details in this thread and assign it to
>myself.
>> Will take a shot at the fix.
>>
>> Thanks,
>> Rahul
>>
>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> I think long ago we used to track which files were actually dirty
>(we had
>>> written bytes to) and only fsync those ones.  But something went
>wrong with
>>> that, and at some point we "simplified" this logic, I think on the
>>> assumption that asking the OS to fsync a file that does in fact
>exist yet
>>> indeed has not changed would be harmless?  But somehow it is not in
>your
>>> case?  Are you on Windows?
>>>
>>> I tried to do a bit of digital archaeology and remember what
>>> happened here, and I came across this relevant looking issue:
>>> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue moved
>>> tracking of which files have been written but not yet fsync'd down
>from
>>> IndexWriter into FSDirectory.
>>>
>>> But there was another change that then removed staleFiles from
>>> FSDirectory entirely still trying to find that.  Aha, found it!
>>> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was
>really
>>> quite upset in that issue ;)
>>>
>>> I also came across this delightful related issue, showing how a
>massive
>>> hurricane (Irene) can lead to finding and fixing a bug in Lucene!
>>> https://issues.apache.org/jira/browse/LUCENE-3418
>>>
>>> > The assumption is that while the commit point is saved, no changes
>>> happen to the segment files in the saved generation.
>>>
>>> This assumption should really be true.  Lucene writes the files,
>append
>>> only, once, and then never changes them, once they are closed. 
>Pulling a
>>> commit point from Solr should further ensure that, even as indexing
>>> continues and new segments are written, the old segments referenced
>in that
>>> commit point will not be deleted.  But apparently this "harmless
>fsync"
>>> Lucene is doing is not so harmless in your use case.  Maybe open an
>issue
>>> and pull out the details from this discussion onto it?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov 
>>> wrote:
>>>
 Also - I should have said - I think the first step here is to write
>a
 focused unit test that demonstrates the existence of the extra
>fsyncs
 that we want to eliminate. It would be awesome if you were able to
 create such a thing.

 On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov
>
 wrote:
 >
 > Yes, please go ahead and open an issue. TBH I'm not sure why this
>is
 > happening - there may be a good reason?? But let's explore it
>using an
 > issue, thanks.
 >
 > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami
>
 wrote:
 > >
 > > I can create a Jira and assign it to myself if that's ok (?). I
 think this can help improve commit performance.
 > > Also, to answer your question, we have indexes sometimes going
>into
 multiple terabytes. Using the replication handler for backup would
>mean
 requiring a disk capacity more than 2x the index size on the
>machine at all
 times, which might not be feasible. So we directly back the index
>up from
 the Solr node to a remote repository.
 > >
 > > Thanks,
 > > Rahul
 > >
 > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov
>
 wrote:
 > >>
 > >> Well, it certainly doesn't seem necessary to fsync files that
>are
 > >> unchanged and have already been fsync'ed. Maybe there's an
 opportunity
 > >> to improve it? On the other hand, support for external

Re: Lucene (unexpected ) fsync on existing segments

2021-03-15 Thread Michael McCandless
Thanks Rahul.

> primary reason being that memory mapping multi-terabyte indexes is not
feasible through mmap

Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what goes
wrong with such large maps?  Lucene's MMapDirectory should chunk the
mapping to deal with ByteBuffer int only address space.

SimpleFSDirectory usually has substantially worse performance than
MMapDirectory.

Still, I suspect you would hit the same issue if you used other FSDirectory
implementations -- the fsync behavior should be the same.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami  wrote:

> Thanks Michael. For your question...yes I am running Solr on Windows and
> running it with SimpleFSDirectoryFactory (primary reason being that memory
> mapping multi-terabyte indexes is not feasible through mmap). I will create
> a Jira later today with the details in this thread and assign it to myself.
> Will take a shot at the fix.
>
> Thanks,
> Rahul
>
> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I think long ago we used to track which files were actually dirty (we had
>> written bytes to) and only fsync those ones.  But something went wrong with
>> that, and at some point we "simplified" this logic, I think on the
>> assumption that asking the OS to fsync a file that does in fact exist yet
>> indeed has not changed would be harmless?  But somehow it is not in your
>> case?  Are you on Windows?
>>
>> I tried to do a bit of digital archaeology and remember what
>> happened here, and I came across this relevant looking issue:
>> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue moved
>> tracking of which files have been written but not yet fsync'd down from
>> IndexWriter into FSDirectory.
>>
>> But there was another change that then removed staleFiles from
>> FSDirectory entirely still trying to find that.  Aha, found it!
>> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was really
>> quite upset in that issue ;)
>>
>> I also came across this delightful related issue, showing how a massive
>> hurricane (Irene) can lead to finding and fixing a bug in Lucene!
>> https://issues.apache.org/jira/browse/LUCENE-3418
>>
>> > The assumption is that while the commit point is saved, no changes
>> happen to the segment files in the saved generation.
>>
>> This assumption should really be true.  Lucene writes the files, append
>> only, once, and then never changes them, once they are closed.  Pulling a
>> commit point from Solr should further ensure that, even as indexing
>> continues and new segments are written, the old segments referenced in that
>> commit point will not be deleted.  But apparently this "harmless fsync"
>> Lucene is doing is not so harmless in your use case.  Maybe open an issue
>> and pull out the details from this discussion onto it?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov 
>> wrote:
>>
>>> Also - I should have said - I think the first step here is to write a
>>> focused unit test that demonstrates the existence of the extra fsyncs
>>> that we want to eliminate. It would be awesome if you were able to
>>> create such a thing.
>>>
>>> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov 
>>> wrote:
>>> >
>>> > Yes, please go ahead and open an issue. TBH I'm not sure why this is
>>> > happening - there may be a good reason?? But let's explore it using an
>>> > issue, thanks.
>>> >
>>> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami 
>>> wrote:
>>> > >
>>> > > I can create a Jira and assign it to myself if that's ok (?). I
>>> think this can help improve commit performance.
>>> > > Also, to answer your question, we have indexes sometimes going into
>>> multiple terabytes. Using the replication handler for backup would mean
>>> requiring a disk capacity more than 2x the index size on the machine at all
>>> times, which might not be feasible. So we directly back the index up from
>>> the Solr node to a remote repository.
>>> > >
>>> > > Thanks,
>>> > > Rahul
>>> > >
>>> > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov 
>>> wrote:
>>> > >>
>>> > >> Well, it certainly doesn't seem necessary to fsync files that are
>>> > >> unchanged and have already been fsync'ed. Maybe there's an
>>> opportunity
>>> > >> to improve it? On the other hand, support for external processes
>>> > >> reading Lucene index files isn't likely to become a feature of
>>> Lucene.
>>> > >> You might want to consider using Solr replication to power your
>>> > >> backup?
>>> > >>
>>> > >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <
>>> rahul196...@gmail.com> wrote:
>>> > >> >
>>> > >> > Thanks Michael. I thought since this discussion is closer to the
>>> code than most discussions on the solr-users list, it seemed like a more
>>> appropriate forum. Will be mindful going forward.
>>> > >> > On your point about new segments, I attached a debugger and 

Re: [solr-operator] branch main updated (fc76f66 -> 75830c3)

2021-03-15 Thread Houston Putman
Should be done now.

https://github.com/apache/solr-operator/commit/d236b4f2cc2713c410554cc5166064b566b58342#diff-b4c2a69650f9ac84008ad9f745859a645c9466350ffdfe5f919655039ee2e2c3

- Houston

On Mon, Mar 15, 2021 at 1:37 PM Uwe Schindler  wrote:

> Hi Houston,
>
> Can we change the commit mail address of this repository? Or does this
> need INFRA involvement?
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: hous...@apache.org 
> > Sent: Monday, March 15, 2021 6:01 PM
> > To: comm...@lucene.apache.org
> > Subject: [solr-operator] branch main updated (fc76f66 -> 75830c3)
> >
> > This is an automated email from the ASF dual-hosted git repository.
> >
> > houston pushed a change to branch main
> > in repository https://gitbox.apache.org/repos/asf/solr-operator.git.
> >
> >
> > from fc76f66  Add conditional dependency for zk-operator helm chart
> (#231)
> >  add 75830c3  Upgrade kustomize. Modernize helm manifest generation.
> > (#238)
> >
> > No new revisions were added by this update.
> >
> > Summary of changes:
> >  hack/helm/copy_crds_roles_helm.sh | 34
> +++---
> >  hack/install_dependencies.sh  |  5 ++---
> >  helm/solr-operator/crds/crds.yaml |  3 +++
> >  3 files changed, 24 insertions(+), 18 deletions(-)
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


RE: [solr-operator] branch main updated (fc76f66 -> 75830c3)

2021-03-15 Thread Uwe Schindler
Hi Houston,

Can we change the commit mail address of this repository? Or does this need 
INFRA involvement?

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: hous...@apache.org 
> Sent: Monday, March 15, 2021 6:01 PM
> To: comm...@lucene.apache.org
> Subject: [solr-operator] branch main updated (fc76f66 -> 75830c3)
> 
> This is an automated email from the ASF dual-hosted git repository.
> 
> houston pushed a change to branch main
> in repository https://gitbox.apache.org/repos/asf/solr-operator.git.
> 
> 
> from fc76f66  Add conditional dependency for zk-operator helm chart (#231)
>  add 75830c3  Upgrade kustomize. Modernize helm manifest generation.
> (#238)
> 
> No new revisions were added by this update.
> 
> Summary of changes:
>  hack/helm/copy_crds_roles_helm.sh | 34 +++---
>  hack/install_dependencies.sh  |  5 ++---
>  helm/solr-operator/crds/crds.yaml |  3 +++
>  3 files changed, 24 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [NOTICE] Old git branches will be pruned (in lucene.git repo)

2021-03-15 Thread Uwe Schindler
Deleting unneeded tags is easy, so starting with the automatied script is fine.

 

I agree to remove those “orking” branches/tags in the lucene and solr repos. 
Once done, the disk space may reduce, as “git gc” will remove all refs.

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: David Smiley  
Sent: Monday, March 15, 2021 3:30 PM
To: lucene-dev 
Subject: Re: [NOTICE] Old git branches will be pruned (in lucene.git repo)

 

What's the point of even having a tag for "branch_8x", "branch_7x" etc.?  Their 
very existence was fundamentally to commit code to, and were constantly moving 
forward as work happens.  They will still exist in lucene-solr repo, so "no 
history is lost" will be true as well.

Having tags for actual releases (e.g. for 8.8, 8.7, etc.) is great for doing 
quick IDE comparisons to see how code changed.




~ David Smiley

Apache Lucene/Solr Search Developer

http://www.linkedin.com/in/davidwsmiley

 

 

On Mon, Mar 15, 2021 at 9:50 AM Jan Høydahl mailto:jan@cominvent.com> > wrote:

Hi,

 

With the new lucene.git repo up and running, we (Uwe, Dawid and I) like to get 
rid of some clutter.

 

We  discussed on Slack and later her on list[1] the option of pruning all the 
112 old branches. It makes no sense to keep stale branch_x_y branches in 
lucene.git repo, as any future 8.x or 7.x release will happen from 
lucene-solr.git, so keeping them as branches in lucene.git is duplication and 
only gives room for developer mistakes. If branch_8_8 does not exist in 
lucene.git repo, noone will push to it, and rather remember to make a patch for 
lucene-solr.git instead.

 

So my plan is to remove all branches in the new lucene.git repostiory and leave 
only the "main" branch. We just did this in solr.git repo (SOLR-15253 [3]).

 

We'll do this by replacing each branch with a git tag, e.g. branch_8x will be 
replaced with tag history/branches/lucene-solr/branch_8x. This is the same 
procedure we did when moving from svn to git. No history is lost!

 

The script I intend to run in a few days is attached on LUCENE-9835 [2].

 

Should you have a work-in-progress on a branch currently scheduled for removal, 
please reply here to excempt it from removal until it is merged.

After the removal you can run "git fetch --prune origin" to not see the remote 
branches in your local clone.

PS: The lucene-solr.git repo, where 8.x development continues, will not be 
affected.

 

[1] 
https://lists.apache.org/thread.html/rc5ac744aa8b081e1e0edb17281d7bb42398a04dcaf6f47421e4a6c41%40%3Cdev.lucene.apache.org%3E
 

 

[2] https://issues.apache.org/jira/browse/LUCENE-9835 

[3] https://issues.apache.org/jira/browse/SOLR-15253



My questions about lucene source

2021-03-15 Thread guohuawu227
I am a software programmer from China.I having been reading the source of lucene version 6.6.0. I have 3 questions to ask.1. is about this method: org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.TermsWriter.pushTerm(BytesRef). I can not understand one line in this method:  [ prefixStarts[i] -= prefixTopSize-1 ]. In my opinion,this line is not necessary. There is no need to update prefixStarts. Because prefixStarts will be updated in the end of this method. It is updated from 'pos' to the end of the new text(the parameter of this method 'text'). If the length of new text is longer than or equals the length of lastTerm then prefixStarts is fully updated in the end. If the lenght of the new text is shorter, of cource not all prefixStarts is updated,but only the items whose index in prefixStarts is smaller than the length of lastTerm is used so there is no need to update them either. So I think this line [ prefixStarts[i] -= prefixTopSize-1 ] is removable.2. is about this method: org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.TermsWriter.finish(). In this method I can not understand why this line [ pushTerm(new BytesRef()); ] appears twice.I made some tests and found the second line has no effect.I think it works fine to have just one. 3. is about the suffix in the file name. I noticed that the file name tim/tip contains one part called suffix. For example, in '_7_Lucene50_0.tim', '0' is the suffix. I can not understand the function of suffix. In my opinion,the format name( 'Lucene50' in this example) is required to find the postingFormat it uses and it is enough. I've read the source but still can not find any use of the suffix. What's the use of it?I am sorry my English is not so good.Looking forward to the reply.

Re: [NOTICE] Old git branches will be pruned (in lucene.git repo)

2021-03-15 Thread David Smiley
What's the point of even having a tag for "branch_8x", "branch_7x" etc.?
Their very existence was fundamentally to commit code to, and were
constantly moving forward as work happens.  They will still exist in
lucene-solr repo, so "no history is lost" will be true as well.
Having tags for actual releases (e.g. for 8.8, 8.7, etc.) is great for
doing quick IDE comparisons to see how code changed.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Mar 15, 2021 at 9:50 AM Jan Høydahl  wrote:

> Hi,
>
> With the new lucene.git repo up and running, we (Uwe, Dawid and I) like to
> get rid of some clutter.
>
> We  discussed on Slack and later her on list[1] the option of pruning all
> the 112 old branches. It makes no sense to keep stale branch_x_y branches
> in lucene.git repo, as any future 8.x or 7.x release will happen from
> lucene-solr.git, so keeping them as branches in lucene.git is duplication
> and only gives room for developer mistakes. If branch_8_8 does not exist in
> lucene.git repo, noone will push to it, and rather remember to make a patch
> for lucene-solr.git instead.
>
> So my plan is to remove all branches in the new lucene.git repostiory and
> leave only the "main" branch. We just did this in solr.git repo (SOLR-15253
> [3]).
>
> We'll do this by replacing each branch with a git tag, e.g. branch_8x will
> be replaced with tag history/branches/lucene-solr/branch_8x. This is the
> same procedure we did when moving from svn to git. *No history is lost!*
>
> The script I intend to run in a few days is attached on LUCENE-9835 [2].
>
> *Should you have a work-in-progress on a branch currently scheduled for
> removal, please reply here to excempt it from removal until it is merged.*
>
> After the removal you can run "git fetch --prune origin" to not see the
> remote branches in your local clone.
>
> PS: The lucene-solr.git repo, where 8.x development continues, will not be
> affected.
>
> [1]
> https://lists.apache.org/thread.html/rc5ac744aa8b081e1e0edb17281d7bb42398a04dcaf6f47421e4a6c41%40%3Cdev.lucene.apache.org%3E
> 
> [2] https://issues.apache.org/jira/browse/LUCENE-9835
> [3] https://issues.apache.org/jira/browse/SOLR-15253
>


[NOTICE] Old git branches will be pruned (in lucene.git repo)

2021-03-15 Thread Jan Høydahl
Hi,

With the new lucene.git repo up and running, we (Uwe, Dawid and I) like to get 
rid of some clutter.

We  discussed on Slack and later her on list[1] the option of pruning all the 
112 old branches. It makes no sense to keep stale branch_x_y branches in 
lucene.git repo, as any future 8.x or 7.x release will happen from 
lucene-solr.git, so keeping them as branches in lucene.git is duplication and 
only gives room for developer mistakes. If branch_8_8 does not exist in 
lucene.git repo, noone will push to it, and rather remember to make a patch for 
lucene-solr.git instead.

So my plan is to remove all branches in the new lucene.git repostiory and leave 
only the "main" branch. We just did this in solr.git repo (SOLR-15253 [3]).

We'll do this by replacing each branch with a git tag, e.g. branch_8x will be 
replaced with tag history/branches/lucene-solr/branch_8x. This is the same 
procedure we did when moving from svn to git. No history is lost!

The script I intend to run in a few days is attached on LUCENE-9835 [2].

Should you have a work-in-progress on a branch currently scheduled for removal, 
please reply here to excempt it from removal until it is merged.

After the removal you can run "git fetch --prune origin" to not see the remote 
branches in your local clone.

PS: The lucene-solr.git repo, where 8.x development continues, will not be 
affected.

[1] 
https://lists.apache.org/thread.html/rc5ac744aa8b081e1e0edb17281d7bb42398a04dcaf6f47421e4a6c41%40%3Cdev.lucene.apache.org%3E
 

[2] https://issues.apache.org/jira/browse/LUCENE-9835 
[3] https://issues.apache.org/jira/browse/SOLR-15253

Re: [JENKINS-EA] Lucene-main-Linux (64bit/jdk-16-ea+36) - Build # 29697 - Failure!

2021-03-15 Thread Dawid Weiss
This should be fixed on main.
D.

On Sun, Mar 14, 2021 at 8:22 PM Policeman Jenkins Server
 wrote:
>
> Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/29697/
> Java: 64bit/jdk-16-ea+36 -XX:+UseCompressedOops -XX:+UseZGC
>
> No tests ran.
>
> -
> To unsubscribe, e-mail: builds-unsubscr...@lucene.apache.org
> For additional commands, e-mail: builds-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org