Re: Static index, fastest way to do forceMerge

2018-12-18 Thread Jerven Tjalling Bolleman

Hi Dawid,

Thanks for looking into this! I have been distracted with other work and
did not get the time I expected to work on it.

Regards,
Jerven
On 11/30/18 12:01 PM, Dawid Weiss wrote:

Just FYI: I implemented a quick and dirty PoC to see what it'd work
like. Not much of a difference on my machine (since postings merging
dominates everything else). Interesting problem how to split it up to
saturate all of available resources though (CPU and I/O).

https://issues.apache.org/jira/browse/LUCENE-8580

Dawid

On Fri, Nov 2, 2018 at 10:17 PM Dawid Weiss  wrote:


Thanks for chipping in, Toke. A ~1TB index is impressive.

Back of the envelope says reading & writing 900GB in 8 hours is
2*900GB/(8*60*60s) = 64MB/s. I don't remember the interface for our
SSD machine, but even with SATA II this is only ~1/5th of the possible
fairly sequential IO throughput. So for us at least, NVMe drives are
not needed to have single-threaded CPU as bottleneck.

The mileage will vary depending on the CPU -- if it can merge the data
from multiple files at ones fast enough then it may theoretically
saturate the bandwidth... but I agree we also seem to be CPU bound on
these N-to-1 merges, a regular SSD is enough.


And +1 to the issue BTW.


I agree. Fine-grained granularity here would be a win even in the
regular "merge is a low-priority citizen" case. At least that's what I
tend to think. And if there are spare CPUs, the gain would be
terrific.

Dawid


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Static index, fastest way to do forceMerge

2018-12-18 Thread Dawid Weiss
That's fine -- distraction is pretty much what defines my work... ;)

D.

On Tue, Dec 18, 2018 at 11:19 AM Jerven Tjalling Bolleman
 wrote:
>
> Hi Dawid,
>
> Thanks for looking into this! I have been distracted with other work and
> did not get the time I expected to work on it.
>
> Regards,
> Jerven
> On 11/30/18 12:01 PM, Dawid Weiss wrote:
> > Just FYI: I implemented a quick and dirty PoC to see what it'd work
> > like. Not much of a difference on my machine (since postings merging
> > dominates everything else). Interesting problem how to split it up to
> > saturate all of available resources though (CPU and I/O).
> >
> > https://issues.apache.org/jira/browse/LUCENE-8580
> >
> > Dawid
> >
> > On Fri, Nov 2, 2018 at 10:17 PM Dawid Weiss  wrote:
> >>
> >> Thanks for chipping in, Toke. A ~1TB index is impressive.
> >>
> >> Back of the envelope says reading & writing 900GB in 8 hours is
> >> 2*900GB/(8*60*60s) = 64MB/s. I don't remember the interface for our
> >> SSD machine, but even with SATA II this is only ~1/5th of the possible
> >> fairly sequential IO throughput. So for us at least, NVMe drives are
> >> not needed to have single-threaded CPU as bottleneck.
> >>
> >> The mileage will vary depending on the CPU -- if it can merge the data
> >> from multiple files at ones fast enough then it may theoretically
> >> saturate the bandwidth... but I agree we also seem to be CPU bound on
> >> these N-to-1 merges, a regular SSD is enough.
> >>
> >>> And +1 to the issue BTW.
> >>
> >> I agree. Fine-grained granularity here would be a win even in the
> >> regular "merge is a low-priority citizen" case. At least that's what I
> >> tend to think. And if there are spare CPUs, the gain would be
> >> terrific.
> >>
> >> Dawid
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org