Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
I'd also love to understand this: > using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on Windows for our index sizes which commonly run north of 1 TB) Is this a known problem on certain versions of Windows? Normally memory mapped IO can scale to very large sizes (well beyond system RAM) an the OS does the right thing (caches the frequently accessed parts of the index). Mike McCandless http://blog.mikemccandless.com On Wed, Jun 7, 2023 at 7:23 AM Adrien Grand wrote: > I agree it's worth discussing. I opened > https://github.com/apache/lucene/issues/12355 and > https://github.com/apache/lucene/issues/12356. > > On Tue, Jun 6, 2023 at 9:17 PM Rahul Goswami > wrote: > > > > Thanks Adrien. I spent some time trying to understand the readByte() in > > ReverseRandomAccessReader (through FST) and compare with 7.x. Although I > > don't understand ALL of the details and reasoning for always loading the > > FST (and in turn the term index) off-heap (as discussed in > > https://github.com/apache/lucene/issues/10297 ) I understand that this > is > > essentially causing disk access for every single byte during readByte(). > > > > Does this warrant a JIRA for regression? > > > > As mentioned, I am noticing a 10x slowdown in > SegmentTermsEnum.seekExact() > > affecting atomic update performance . For setups like mine that can't use > > mmap due to large indexes this would be a legit regression, no? > > > > - Rahul > > > > On Tue, Jun 6, 2023 at 10:09 AM Adrien Grand wrote: > > > > > Yes, this changed in 8.x: > > > - 8.0 moved the terms index off-heap for non-PK fields with > > > MMapDirectory. https://github.com/apache/lucene/issues/9681 > > > - Then in 8.6 the FST was moved off-heap all the time. > > > https://github.com/apache/lucene/issues/10297 > > > > > > More generally, there's a few files that are no longer loaded in heap > > > in 8.x. It should be possible to load them back in heap by doing > > > something like that (beware, I did not actually test this code): > > > > > > class MyHeapDirectory extends FilterDirectory { > > > > > > MyHeapDirectory(Directory in) { > > > super(in); > > > } > > > > > > @Override > > > public IndexInput openInput(String name, IOContext context) throws > > > IOException { > > > if (context.load == false) { > > > return super.openInput(name, context); > > > } else { > > > try (IndexInput in = super.openInput(name, context)) { > > > byte[] bytes = new byte[Math.toIntExact(in.length())]; > > > in.readBytes(bytes, bytes.length); > > > ByteBuffer bb = > > > > ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asReadOnlyBuffer(); > > > return new ByteBuffersIndexInput(new > > > ByteBuffersDataInput(Collections.singletonList(bb)), > > > "ByteBuffersIndexInput(" + name + ")"); > > > } > > > } > > > } > > > > > > } > > > > > > On Tue, Jun 6, 2023 at 3:41 PM Rahul Goswami > > > wrote: > > > > > > > > Thanks Adrien. Is this behavior of FST something that has changed in > > > Lucene > > > > 8.x (from 7.x)? > > > > Also, is the terms index not loaded into memory anymore in 8.x? > > > > > > > > To your point on MMapDirectoryFactory, it is much faster as you > > > > anticipated, but the indexes commonly being >1 TB makes the Windows > > > machine > > > > freeze to a point I sometimes can't even connect to the VM. > > > > SimpleFSDirectory works well for us from that standpoint. > > > > > > > > To add, both NIOFS and SimpleFS have similar indexing benchmarks on > > > > Windows. I understand it is because of the Java bug which > synchronizes > > > > internally in the native call for NIOFs. > > > > > > > > -Rahul > > > > > > > > On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand > wrote: > > > > > > > > > +Alan Woodward helped me better understand what is going on here. > > > > > BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) > > > > > doesn't play well with the fact that the FST reads bytes backwards: > > > > > every call to readByte() triggers a refill of 1kB because it wants > to > > > > > read the byte that is just before what the buffer contains. > > > > > > > > > > On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand > wrote: > > > > > > > > > > > > My best guess based on your description of the issue is that > > > > > > SimpleFSDirectory doesn't like the fact that the terms index now > > > reads > > > > > > data directly from the directory instead of loading the terms > index > > > in > > > > > > heap. Would you be able to run the same benchmark with > MMapDirectory > > > > > > to check if it addresses the regression? > > > > > > > > > > > > > > > > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami < > rahul196...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > Hello, > > > > > > > We started experiencing slowness with atomic updates in Solr > after > > > > > > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed > the > > > > > > > slowness to be in RealTimeGet's
Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
I agree it's worth discussing. I opened https://github.com/apache/lucene/issues/12355 and https://github.com/apache/lucene/issues/12356. On Tue, Jun 6, 2023 at 9:17 PM Rahul Goswami wrote: > > Thanks Adrien. I spent some time trying to understand the readByte() in > ReverseRandomAccessReader (through FST) and compare with 7.x. Although I > don't understand ALL of the details and reasoning for always loading the > FST (and in turn the term index) off-heap (as discussed in > https://github.com/apache/lucene/issues/10297 ) I understand that this is > essentially causing disk access for every single byte during readByte(). > > Does this warrant a JIRA for regression? > > As mentioned, I am noticing a 10x slowdown in SegmentTermsEnum.seekExact() > affecting atomic update performance . For setups like mine that can't use > mmap due to large indexes this would be a legit regression, no? > > - Rahul > > On Tue, Jun 6, 2023 at 10:09 AM Adrien Grand wrote: > > > Yes, this changed in 8.x: > > - 8.0 moved the terms index off-heap for non-PK fields with > > MMapDirectory. https://github.com/apache/lucene/issues/9681 > > - Then in 8.6 the FST was moved off-heap all the time. > > https://github.com/apache/lucene/issues/10297 > > > > More generally, there's a few files that are no longer loaded in heap > > in 8.x. It should be possible to load them back in heap by doing > > something like that (beware, I did not actually test this code): > > > > class MyHeapDirectory extends FilterDirectory { > > > > MyHeapDirectory(Directory in) { > > super(in); > > } > > > > @Override > > public IndexInput openInput(String name, IOContext context) throws > > IOException { > > if (context.load == false) { > > return super.openInput(name, context); > > } else { > > try (IndexInput in = super.openInput(name, context)) { > > byte[] bytes = new byte[Math.toIntExact(in.length())]; > > in.readBytes(bytes, bytes.length); > > ByteBuffer bb = > > ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asReadOnlyBuffer(); > > return new ByteBuffersIndexInput(new > > ByteBuffersDataInput(Collections.singletonList(bb)), > > "ByteBuffersIndexInput(" + name + ")"); > > } > > } > > } > > > > } > > > > On Tue, Jun 6, 2023 at 3:41 PM Rahul Goswami > > wrote: > > > > > > Thanks Adrien. Is this behavior of FST something that has changed in > > Lucene > > > 8.x (from 7.x)? > > > Also, is the terms index not loaded into memory anymore in 8.x? > > > > > > To your point on MMapDirectoryFactory, it is much faster as you > > > anticipated, but the indexes commonly being >1 TB makes the Windows > > machine > > > freeze to a point I sometimes can't even connect to the VM. > > > SimpleFSDirectory works well for us from that standpoint. > > > > > > To add, both NIOFS and SimpleFS have similar indexing benchmarks on > > > Windows. I understand it is because of the Java bug which synchronizes > > > internally in the native call for NIOFs. > > > > > > -Rahul > > > > > > On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand wrote: > > > > > > > +Alan Woodward helped me better understand what is going on here. > > > > BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) > > > > doesn't play well with the fact that the FST reads bytes backwards: > > > > every call to readByte() triggers a refill of 1kB because it wants to > > > > read the byte that is just before what the buffer contains. > > > > > > > > On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand wrote: > > > > > > > > > > My best guess based on your description of the issue is that > > > > > SimpleFSDirectory doesn't like the fact that the terms index now > > reads > > > > > data directly from the directory instead of loading the terms index > > in > > > > > heap. Would you be able to run the same benchmark with MMapDirectory > > > > > to check if it addresses the regression? > > > > > > > > > > > > > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami > > > > wrote: > > > > > > > > > > > > Hello, > > > > > > We started experiencing slowness with atomic updates in Solr after > > > > > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > > > > > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() > > call > > > > > > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > > > > > > > > > > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. > > After > > > > > > discussion on the Solr mailing list I created the below JIRA: > > > > > > > > > > > > https://issues.apache.org/jira/browse/SOLR-16838 > > > > > > > > > > > > The thread dumps collected show a lot of threads stuck in the > > > > > > FST.findTargetArc() > > > > > > method. Testing environment details: > > > > > > > > > > > > Environment details: > > > > > > - Java 11 on Windows server > > > > > > - Xms1536m Xmx3072m > > > > > > - Indexing client code running 15 parallel threads indexing in > > batches > > > > of > > > > > >
Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
Thanks Adrien. I spent some time trying to understand the readByte() in ReverseRandomAccessReader (through FST) and compare with 7.x. Although I don't understand ALL of the details and reasoning for always loading the FST (and in turn the term index) off-heap (as discussed in https://github.com/apache/lucene/issues/10297 ) I understand that this is essentially causing disk access for every single byte during readByte(). Does this warrant a JIRA for regression? As mentioned, I am noticing a 10x slowdown in SegmentTermsEnum.seekExact() affecting atomic update performance . For setups like mine that can't use mmap due to large indexes this would be a legit regression, no? - Rahul On Tue, Jun 6, 2023 at 10:09 AM Adrien Grand wrote: > Yes, this changed in 8.x: > - 8.0 moved the terms index off-heap for non-PK fields with > MMapDirectory. https://github.com/apache/lucene/issues/9681 > - Then in 8.6 the FST was moved off-heap all the time. > https://github.com/apache/lucene/issues/10297 > > More generally, there's a few files that are no longer loaded in heap > in 8.x. It should be possible to load them back in heap by doing > something like that (beware, I did not actually test this code): > > class MyHeapDirectory extends FilterDirectory { > > MyHeapDirectory(Directory in) { > super(in); > } > > @Override > public IndexInput openInput(String name, IOContext context) throws > IOException { > if (context.load == false) { > return super.openInput(name, context); > } else { > try (IndexInput in = super.openInput(name, context)) { > byte[] bytes = new byte[Math.toIntExact(in.length())]; > in.readBytes(bytes, bytes.length); > ByteBuffer bb = > ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asReadOnlyBuffer(); > return new ByteBuffersIndexInput(new > ByteBuffersDataInput(Collections.singletonList(bb)), > "ByteBuffersIndexInput(" + name + ")"); > } > } > } > > } > > On Tue, Jun 6, 2023 at 3:41 PM Rahul Goswami > wrote: > > > > Thanks Adrien. Is this behavior of FST something that has changed in > Lucene > > 8.x (from 7.x)? > > Also, is the terms index not loaded into memory anymore in 8.x? > > > > To your point on MMapDirectoryFactory, it is much faster as you > > anticipated, but the indexes commonly being >1 TB makes the Windows > machine > > freeze to a point I sometimes can't even connect to the VM. > > SimpleFSDirectory works well for us from that standpoint. > > > > To add, both NIOFS and SimpleFS have similar indexing benchmarks on > > Windows. I understand it is because of the Java bug which synchronizes > > internally in the native call for NIOFs. > > > > -Rahul > > > > On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand wrote: > > > > > +Alan Woodward helped me better understand what is going on here. > > > BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) > > > doesn't play well with the fact that the FST reads bytes backwards: > > > every call to readByte() triggers a refill of 1kB because it wants to > > > read the byte that is just before what the buffer contains. > > > > > > On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand wrote: > > > > > > > > My best guess based on your description of the issue is that > > > > SimpleFSDirectory doesn't like the fact that the terms index now > reads > > > > data directly from the directory instead of loading the terms index > in > > > > heap. Would you be able to run the same benchmark with MMapDirectory > > > > to check if it addresses the regression? > > > > > > > > > > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami > > > wrote: > > > > > > > > > > Hello, > > > > > We started experiencing slowness with atomic updates in Solr after > > > > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > > > > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() > call > > > > > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > > > > > > > > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. > After > > > > > discussion on the Solr mailing list I created the below JIRA: > > > > > > > > > > https://issues.apache.org/jira/browse/SOLR-16838 > > > > > > > > > > The thread dumps collected show a lot of threads stuck in the > > > > > FST.findTargetArc() > > > > > method. Testing environment details: > > > > > > > > > > Environment details: > > > > > - Java 11 on Windows server > > > > > - Xms1536m Xmx3072m > > > > > - Indexing client code running 15 parallel threads indexing in > batches > > > of > > > > > 1000 on a standalone core. > > > > > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work > well > > > on > > > > > Windows for our index sizes which commonly run north of 1 TB) > > > > > > > > > > > > > > https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing > > > > > > > > > > Is there a known issue with slowness with TermsEnum.seekExact() in > > > Lucene > > > > > 8.x ? > > >
Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
Yes, this changed in 8.x: - 8.0 moved the terms index off-heap for non-PK fields with MMapDirectory. https://github.com/apache/lucene/issues/9681 - Then in 8.6 the FST was moved off-heap all the time. https://github.com/apache/lucene/issues/10297 More generally, there's a few files that are no longer loaded in heap in 8.x. It should be possible to load them back in heap by doing something like that (beware, I did not actually test this code): class MyHeapDirectory extends FilterDirectory { MyHeapDirectory(Directory in) { super(in); } @Override public IndexInput openInput(String name, IOContext context) throws IOException { if (context.load == false) { return super.openInput(name, context); } else { try (IndexInput in = super.openInput(name, context)) { byte[] bytes = new byte[Math.toIntExact(in.length())]; in.readBytes(bytes, bytes.length); ByteBuffer bb = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asReadOnlyBuffer(); return new ByteBuffersIndexInput(new ByteBuffersDataInput(Collections.singletonList(bb)), "ByteBuffersIndexInput(" + name + ")"); } } } } On Tue, Jun 6, 2023 at 3:41 PM Rahul Goswami wrote: > > Thanks Adrien. Is this behavior of FST something that has changed in Lucene > 8.x (from 7.x)? > Also, is the terms index not loaded into memory anymore in 8.x? > > To your point on MMapDirectoryFactory, it is much faster as you > anticipated, but the indexes commonly being >1 TB makes the Windows machine > freeze to a point I sometimes can't even connect to the VM. > SimpleFSDirectory works well for us from that standpoint. > > To add, both NIOFS and SimpleFS have similar indexing benchmarks on > Windows. I understand it is because of the Java bug which synchronizes > internally in the native call for NIOFs. > > -Rahul > > On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand wrote: > > > +Alan Woodward helped me better understand what is going on here. > > BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) > > doesn't play well with the fact that the FST reads bytes backwards: > > every call to readByte() triggers a refill of 1kB because it wants to > > read the byte that is just before what the buffer contains. > > > > On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand wrote: > > > > > > My best guess based on your description of the issue is that > > > SimpleFSDirectory doesn't like the fact that the terms index now reads > > > data directly from the directory instead of loading the terms index in > > > heap. Would you be able to run the same benchmark with MMapDirectory > > > to check if it addresses the regression? > > > > > > > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami > > wrote: > > > > > > > > Hello, > > > > We started experiencing slowness with atomic updates in Solr after > > > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > > > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call > > > > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > > > > > > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. After > > > > discussion on the Solr mailing list I created the below JIRA: > > > > > > > > https://issues.apache.org/jira/browse/SOLR-16838 > > > > > > > > The thread dumps collected show a lot of threads stuck in the > > > > FST.findTargetArc() > > > > method. Testing environment details: > > > > > > > > Environment details: > > > > - Java 11 on Windows server > > > > - Xms1536m Xmx3072m > > > > - Indexing client code running 15 parallel threads indexing in batches > > of > > > > 1000 on a standalone core. > > > > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well > > on > > > > Windows for our index sizes which commonly run north of 1 TB) > > > > > > > > > > https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing > > > > > > > > Is there a known issue with slowness with TermsEnum.seekExact() in > > Lucene > > > > 8.x ? > > > > > > > > Thanks, > > > > Rahul > > > > > > > > > > > > -- > > > Adrien > > > > > > > > -- > > Adrien > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
Thanks Adrien. Is this behavior of FST something that has changed in Lucene 8.x (from 7.x)? Also, is the terms index not loaded into memory anymore in 8.x? To your point on MMapDirectoryFactory, it is much faster as you anticipated, but the indexes commonly being >1 TB makes the Windows machine freeze to a point I sometimes can't even connect to the VM. SimpleFSDirectory works well for us from that standpoint. To add, both NIOFS and SimpleFS have similar indexing benchmarks on Windows. I understand it is because of the Java bug which synchronizes internally in the native call for NIOFs. -Rahul On Tue, Jun 6, 2023 at 9:32 AM Adrien Grand wrote: > +Alan Woodward helped me better understand what is going on here. > BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) > doesn't play well with the fact that the FST reads bytes backwards: > every call to readByte() triggers a refill of 1kB because it wants to > read the byte that is just before what the buffer contains. > > On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand wrote: > > > > My best guess based on your description of the issue is that > > SimpleFSDirectory doesn't like the fact that the terms index now reads > > data directly from the directory instead of loading the terms index in > > heap. Would you be able to run the same benchmark with MMapDirectory > > to check if it addresses the regression? > > > > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami > wrote: > > > > > > Hello, > > > We started experiencing slowness with atomic updates in Solr after > > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call > > > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > > > > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. After > > > discussion on the Solr mailing list I created the below JIRA: > > > > > > https://issues.apache.org/jira/browse/SOLR-16838 > > > > > > The thread dumps collected show a lot of threads stuck in the > > > FST.findTargetArc() > > > method. Testing environment details: > > > > > > Environment details: > > > - Java 11 on Windows server > > > - Xms1536m Xmx3072m > > > - Indexing client code running 15 parallel threads indexing in batches > of > > > 1000 on a standalone core. > > > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well > on > > > Windows for our index sizes which commonly run north of 1 TB) > > > > > > > https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing > > > > > > Is there a known issue with slowness with TermsEnum.seekExact() in > Lucene > > > 8.x ? > > > > > > Thanks, > > > Rahul > > > > > > > > -- > > Adrien > > > > -- > Adrien > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
+Alan Woodward helped me better understand what is going on here. BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) doesn't play well with the fact that the FST reads bytes backwards: every call to readByte() triggers a refill of 1kB because it wants to read the byte that is just before what the buffer contains. On Tue, Jun 6, 2023 at 2:07 PM Adrien Grand wrote: > > My best guess based on your description of the issue is that > SimpleFSDirectory doesn't like the fact that the terms index now reads > data directly from the directory instead of loading the terms index in > heap. Would you be able to run the same benchmark with MMapDirectory > to check if it addresses the regression? > > > On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami wrote: > > > > Hello, > > We started experiencing slowness with atomic updates in Solr after > > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call > > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. After > > discussion on the Solr mailing list I created the below JIRA: > > > > https://issues.apache.org/jira/browse/SOLR-16838 > > > > The thread dumps collected show a lot of threads stuck in the > > FST.findTargetArc() > > method. Testing environment details: > > > > Environment details: > > - Java 11 on Windows server > > - Xms1536m Xmx3072m > > - Indexing client code running 15 parallel threads indexing in batches of > > 1000 on a standalone core. > > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on > > Windows for our index sizes which commonly run north of 1 TB) > > > > https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing > > > > Is there a known issue with slowness with TermsEnum.seekExact() in Lucene > > 8.x ? > > > > Thanks, > > Rahul > > > > -- > Adrien -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7
My best guess based on your description of the issue is that SimpleFSDirectory doesn't like the fact that the terms index now reads data directly from the directory instead of loading the terms index in heap. Would you be able to run the same benchmark with MMapDirectory to check if it addresses the regression? On Tue, Jun 6, 2023 at 5:47 AM Rahul Goswami wrote: > > Hello, > We started experiencing slowness with atomic updates in Solr after > upgrading from 7.7.2 to 8.11.1. Running several tests revealed the > slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call > which eventually calls Lucene's SegmentTermsEnum.seekExact().. > > In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. After > discussion on the Solr mailing list I created the below JIRA: > > https://issues.apache.org/jira/browse/SOLR-16838 > > The thread dumps collected show a lot of threads stuck in the > FST.findTargetArc() > method. Testing environment details: > > Environment details: > - Java 11 on Windows server > - Xms1536m Xmx3072m > - Indexing client code running 15 parallel threads indexing in batches of > 1000 on a standalone core. > - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on > Windows for our index sizes which commonly run north of 1 TB) > > https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing > > Is there a known issue with slowness with TermsEnum.seekExact() in Lucene > 8.x ? > > Thanks, > Rahul -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Performance regression in getting doc by id in Lucene 8 vs Lucene 7
Hello, We started experiencing slowness with atomic updates in Solr after upgrading from 7.7.2 to 8.11.1. Running several tests revealed the slowness to be in RealTimeGet's SolrIndexSearcher.getFirstMatch() call which eventually calls Lucene's SegmentTermsEnum.seekExact().. In the benchmarks I ran, 8.11.1 is about 10x slower than 7.7.2. After discussion on the Solr mailing list I created the below JIRA: https://issues.apache.org/jira/browse/SOLR-16838 The thread dumps collected show a lot of threads stuck in the FST.findTargetArc() method. Testing environment details: Environment details: - Java 11 on Windows server - Xms1536m Xmx3072m - Indexing client code running 15 parallel threads indexing in batches of 1000 on a standalone core. - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on Windows for our index sizes which commonly run north of 1 TB) https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing Is there a known issue with slowness with TermsEnum.seekExact() in Lucene 8.x ? Thanks, Rahul