Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-05-23 Thread Michael McCandless
I finally dug into this, and it turns out the nightly benchmark I run had bad bottlenecks such that it couldn't feed documents quickly enough to Lucene to take advantage of the concurrent hardware in beast2. I fixed that and just re-ran the nightly run and it shows good gains:

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-15 Thread Robert Muir
you won't see indexing improvements there because the dataset in question is wikipedia and mostly indexing full text. I think it may have one measly numeric field. On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić wrote: > (replying to my original email because I

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Stephen Green
As someone who runs Lucene on big hardware, I'd be very interested to see the tuning parameters when you do get a chance.. On Thu, Apr 14, 2016 at 3:41 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow > got slower

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Michael McCandless
Yes, dual 2699 v3, with 256 GB of RAM, yet indexing throughput somehow got slower :) I haven't re-tuned indexing threads, IW buffer size yet for this new hardware ... Mike McCandless http://blog.mikemccandless.com On Thu, Apr 14, 2016 at 2:09 PM, Ishan Chattopadhyaya

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Ishan Chattopadhyaya
Wow, 72 cores? That sounds astounding. Are they dual Xeon E5 2699 v3 CPUs with 18 cores each, with hyperthreading = 18*2*2=72 threads? On Thu, Apr 14, 2016 at 11:33 PM, Dawid Weiss wrote: > The GC change is after this: > > BJ (2015-12-02): Upgrade to beast2 (72 cores, 256

Lucene indexing throughput (and Mike's lucenebench charts)

2016-04-14 Thread Otis Gospodnetić
Hi, I was looking at Mike's http://home.apache.org/~mikemccand/lucenebench/indexing.html secretly hoping to spot some recent improvements in indexing throughput but instead it looks like: * indexing throughput hasn't really gone up in the last ~5 years * indexing was faster in 2014, but then