Re: Slower fetch document after upgrade >=8.7

2021-04-08 Thread Adrien Grand
I just opened https://issues.apache.org/jira/browse/LUCENE-9917.

On Thu, Apr 8, 2021 at 4:34 PM Никита Михайлов 
wrote:

> Thank you. Understood the policy
>
> > I have some changes to stored fields on my plate, I'll include this
> change as well.
>
> Is there a ticket for this change?
>
> чт, 8 апр. 2021 г. в 15:30, Adrien Grand :
> >
> > Actually, we don't plan to have flexible settings even for advanced
> > developers. Our stance on these discussions is that we should be
> > opinionated about the default codec and not offer any options. Rather
> than
> > exposing advanced settings for advanced users, these advanced users can
> > build their own codec and take care of backward compatibility themselves.
> >
> > On Thu, Apr 8, 2021 at 10:11 AM Никита Михайлов <
> mihaylovniki...@gmail.com>
> > wrote:
> >
> > > Thanks for the reply.
> > >
> > > The problem of understanding. You can make flexible settings for
> > > advanced developers, leaving two facets by default. In tests, check
> > > these facets
> > > Never change them so that the developers themselves explicitly set the
> > > settings. IMHO, I think this will help to avoid such problems
> > >
> > > OK. Have a ticket?
> > >
> > > чт, 8 апр. 2021 г. в 13:52, Adrien Grand :
> > > >
> > > > Thanks for the feedback.
> > > >
> > > > We don't want to offer too many choices, as it complicates backward
> > > > compatibility testing, and want to stick to two options at most.
> > > >
> > > > Since this is the second time I'm seeing this feedback, I'm inclined
> to
> > > > reduce the block size for BEST_SPEED in order to trade a bit of
> > > compression
> > > > ratio for better decompression speed. I have some changes to stored
> > > fields
> > > > on my plate, I'll include this change as well.
> > > >
> > > > On Thu, Apr 8, 2021 at 7:04 AM Никита Михайлов <
> > > mihaylovniki...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi
> > > > > BEST_SPEED has been changed in LUCENE-9447 and LUCENE-9486. For
> this
> > > > > reason, retrieving data from elasticsearch has slowed down by
> 10-20%.
> > > When
> > > > > there is a lot of data, this is critical
> > > > > Can developers leave the choice of which codec to use: LZ4(16kB)
> (old
> > > > > BEST_SPEED) or LZ4 with preset dict(BEST_SPEED_SAVING_DISKSIZE)? Or
> > > make
> > > > > more flexible settings?
> > > > >
> > > > > Otherwise, such changes may be a blocker or will have to spend
> money on
> > > > > buying new hardware
> > > > >
> > > >
> > > >
> > > > --
> > > > Adrien
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
> > --
> > Adrien
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien


Re: Slower fetch document after upgrade >=8.7

2021-04-08 Thread Никита Михайлов
Thank you. Understood the policy

> I have some changes to stored fields on my plate, I'll include this change as 
> well.

Is there a ticket for this change?

чт, 8 апр. 2021 г. в 15:30, Adrien Grand :
>
> Actually, we don't plan to have flexible settings even for advanced
> developers. Our stance on these discussions is that we should be
> opinionated about the default codec and not offer any options. Rather than
> exposing advanced settings for advanced users, these advanced users can
> build their own codec and take care of backward compatibility themselves.
>
> On Thu, Apr 8, 2021 at 10:11 AM Никита Михайлов 
> wrote:
>
> > Thanks for the reply.
> >
> > The problem of understanding. You can make flexible settings for
> > advanced developers, leaving two facets by default. In tests, check
> > these facets
> > Never change them so that the developers themselves explicitly set the
> > settings. IMHO, I think this will help to avoid such problems
> >
> > OK. Have a ticket?
> >
> > чт, 8 апр. 2021 г. в 13:52, Adrien Grand :
> > >
> > > Thanks for the feedback.
> > >
> > > We don't want to offer too many choices, as it complicates backward
> > > compatibility testing, and want to stick to two options at most.
> > >
> > > Since this is the second time I'm seeing this feedback, I'm inclined to
> > > reduce the block size for BEST_SPEED in order to trade a bit of
> > compression
> > > ratio for better decompression speed. I have some changes to stored
> > fields
> > > on my plate, I'll include this change as well.
> > >
> > > On Thu, Apr 8, 2021 at 7:04 AM Никита Михайлов <
> > mihaylovniki...@gmail.com>
> > > wrote:
> > >
> > > > Hi
> > > > BEST_SPEED has been changed in LUCENE-9447 and LUCENE-9486. For this
> > > > reason, retrieving data from elasticsearch has slowed down by 10-20%.
> > When
> > > > there is a lot of data, this is critical
> > > > Can developers leave the choice of which codec to use: LZ4(16kB) (old
> > > > BEST_SPEED) or LZ4 with preset dict(BEST_SPEED_SAVING_DISKSIZE)? Or
> > make
> > > > more flexible settings?
> > > >
> > > > Otherwise, such changes may be a blocker or will have to spend money on
> > > > buying new hardware
> > > >
> > >
> > >
> > > --
> > > Adrien
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Slower search after 8.5.x to >=8.6

2021-04-08 Thread Никита Михайлов
> It's usually more a bottleneck when indexing with IndexWriter # 
> updateDocument, which needs to perform one ID lookup for every indexed 
> document. I guess that the queries
that you are running match so few hits that very little time is spent
reading postings, is that correct?
> But then that would also means that your queries are running very fast, 
> likely in the order of a few millis?

IndexWriter # updateDocumen - I did not notice any problems, but I
checked a little

Tried it on different searches and on different data. Yes, the fewer
hits the slower

> Or maybe you have misconfigured your merge policy in a way that makes your
indices have so many segments that terms dictionary lookups may be a
bottleneck?

The problem is reproducible with different numbers and sizes of
segments, but the smaller number of segments, the less degradation in
speed.
This behavior was before the upgrade

We don't have many segments. And the merge policy did not change

чт, 8 апр. 2021 г. в 18:50, Adrien Grand :
>
> FSDirectory#open is just a utility method that tries to pick the best
> Directory implementation based on the platform, it's most likely
> MMapDirectory for you, which is the directory implementation we use on all
> 64-bit platforms. So it's intriguing that you are seeing a slowdown with
> MMapDirectory but not with FSDirectory#open. To my knowledge, Elasticsearch
> is not doing anything special that could explain why MMapDirectory is slow
> with Elasticsearch yet fast with Lucene.
>
> Regardless of the Directory implementation, it's surprising that term
> lookups be the bottleneck for query execution. It's usually more a
> bottleneck when indexing with IndexWriter#updateDocument, which needs to
> perform one ID lookup for every indexed document. I guess that the queries
> that you are running match so few hits that very little time is spent
> reading postings, is that correct? But then that would also means that your
> queries are running very fast, likely in the order of a few millis? Or
> maybe you have misconfigured your merge policy in a way that makes your
> indices have so many segments that terms dictionary lookups may be a
> bottleneck?
>
> On Thu, Apr 8, 2021 at 1:40 PM Никита Михайлов 
> wrote:
>
> > Thanks for the answer
> > NIOFSDirectory is like an example. Degradation is also on
> > MMapDirectory and SimpleFSDirectory
> >
> > We are using elasticseach and it has: simplefs (SimpleFsDirectory),
> > niofs (NIOFSDirectory), mmapfs (MMapDirectory) and hybridfs
> > (NIOFSDirectory + MMapDirectory). And for us, while niofs was a little
> > faster than other stores
> >
> > Yes FSDirectory works fast(both commits), but now it is difficult to
> > test on prod elasticseach.
> > But why is FSDirectory fast? How to understand this?
> >
> > чт, 8 апр. 2021 г. в 13:49, Adrien Grand :
> > >
> > > Hello,
> > >
> > > Why are you forcing NIOFSDirectory instead of using Lucene's defaults via
> > > FSDirectory#open? I wonder if this might contribute to the slowdown you
> > are
> > > seeing given that access to the terms index tends to be a bit random.
> > >
> > > It's very unlikely we'll add back a toggle for this as there is no point
> > in
> > > holding the terms index in JVM heap when it could live in the OS cache
> > > instead.
> > >
> > > On Thu, Apr 8, 2021 at 7:57 AM Никита Михайлов <
> > mihaylovniki...@gmail.com>
> > > wrote:
> > >
> > > > Hi. I noticed that after the upgrade from Lucene8.5.x to Lucene >=8.6,
> > > >  search became slower(example TopScoreDocCollector became 20-30%
> > slower,
> > > > from ElasticSearch - 50%).
> > > >
> > > > While testing, I realized that it happened after LUCENE-9257(commit
> > > > e7a61ea). Bug or feature? Can add settings for isOffHeep? To make the
> > > > developer explicitly make this choice
> > > >
> > > > Added a file that shows a simple demo that the search is slow
> > > > Need to run on commit e7a61ea and 90aced5, you will notice how the
> > speed
> > > > drops to 30%
> > > >
> > > > -
> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> > >
> > > --
> > > Adrien
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Slower search after 8.5.x to >=8.6

2021-04-08 Thread Никита Михайлов
Hi. I noticed that after the upgrade from Lucene8.5.x to Lucene >=8.6,
 search became slower(example TopScoreDocCollector became 20-30% slower,
from ElasticSearch - 50%).

While testing, I realized that it happened after LUCENE-9257(commit
e7a61ea). Bug or feature? Can add settings for isOffHeep? To make the
developer explicitly make this choice

Added a file that shows a simple demo that the search is slow
Need to run on commit e7a61ea and 90aced5, you will notice how the speed
drops to 30%
package org.apache.lucene.demo;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.store.NIOFSDirectory;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;

import org.apache.lucene.index.SegmentReader;
import org.apache.lucene.index.SegmentCommitInfo;
import org.apache.lucene.index.LeafReaderContext;

import java.io.File;
import java.io.IOException;
import java.nio.file.Paths;


public class SpeedLucene {
  public static void main(String[] args) throws IOException, ParseException {
System.out.println("-Start-");

// 0. create the analyzer
StandardAnalyzer analyzer = new StandardAnalyzer();

// 1. create the index
boolean isDeleteDir = deleteDirectory(new File("fs_test"));
Directory index = new NIOFSDirectory(Paths.get("fs_test/"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
// config.setRAMBufferSizeMB(16);


IndexWriter w = new IndexWriter(index, config);
for (int x = 0; x < 100; x++) {
  addDoc(w, "Lucene in Action " + x, "1" + x);
}
w.close();

// 2. query
Query q = new QueryParser("title", analyzer).parse("lucene");

// 3. search
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);

// 3.1 segment info
int numDocs = 0;
int numDeletedDocs = 0;
long sizeInBytes = 0;
for (LeafReaderContext readerContext : reader.leaves()) {
  final SegmentReader segmentReader = (SegmentReader) readerContext.reader();
  SegmentCommitInfo info = segmentReader.getSegmentInfo();
  numDocs += readerContext.reader().numDocs();
  numDeletedDocs += readerContext.reader().numDeletedDocs();

  long ramBytesUsed = segmentReader.getPostingsReader().ramBytesUsed();

  System.out.println("Codec" + info.info.getCodec());
  System.out.println("Postings ram " + ramBytesUsed + " byte");
  if (segmentReader.getNormsReader() != null) {
ramBytesUsed += segmentReader.getNormsReader().ramBytesUsed();
System.out.println("Norms ram " + segmentReader.getNormsReader().ramBytesUsed() + " byte");
  }

  if (segmentReader.getDocValuesReader() != null) {
ramBytesUsed += segmentReader.getDocValuesReader().ramBytesUsed();
System.out.println("DocValues ram " + segmentReader.getDocValuesReader().ramBytesUsed() + " byte");
  }

  if (segmentReader.getFieldsReader() != null) {
ramBytesUsed += segmentReader.getFieldsReader().ramBytesUsed();
System.out.println("Fields ram " + segmentReader.getFieldsReader().ramBytesUsed() + " byte");
  }

  if (segmentReader.getTermVectorsReader() != null) {
ramBytesUsed += segmentReader.getTermVectorsReader().ramBytesUsed();
System.out.println("TermVectors ram " + segmentReader.getTermVectorsReader().ramBytesUsed() + " byte");
  }

  if (segmentReader.getPointsReader() != null) {
ramBytesUsed += segmentReader.getPointsReader().ramBytesUsed();
System.out.println("Points ram " + segmentReader.getPointsReader().ramBytesUsed() + " byte");
  }
  System.out.println("---");

  sizeInBytes += ramBytesUsed;
}
System.out.println("sizeInBytes " + sizeInBytes + " ");

TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, hitsPerPage);
long startTime = System.currentTimeMillis();
for (int x = 0; x < 10; x++) {
  searcher.search(q, collector);
}
System.out.println("Time searcher " + (System.currentTimeMillis() - startTime));
ScoreDoc[] hits = collector.topDocs().scoreDocs;

// 4. display results
startTime = System.currentTimeMillis();
for (int x = 0; x < 10; x++) {

  for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d

Re: Slower search after 8.5.x to >=8.6

2021-04-08 Thread Adrien Grand
FSDirectory#open is just a utility method that tries to pick the best
Directory implementation based on the platform, it's most likely
MMapDirectory for you, which is the directory implementation we use on all
64-bit platforms. So it's intriguing that you are seeing a slowdown with
MMapDirectory but not with FSDirectory#open. To my knowledge, Elasticsearch
is not doing anything special that could explain why MMapDirectory is slow
with Elasticsearch yet fast with Lucene.

Regardless of the Directory implementation, it's surprising that term
lookups be the bottleneck for query execution. It's usually more a
bottleneck when indexing with IndexWriter#updateDocument, which needs to
perform one ID lookup for every indexed document. I guess that the queries
that you are running match so few hits that very little time is spent
reading postings, is that correct? But then that would also means that your
queries are running very fast, likely in the order of a few millis? Or
maybe you have misconfigured your merge policy in a way that makes your
indices have so many segments that terms dictionary lookups may be a
bottleneck?

On Thu, Apr 8, 2021 at 1:40 PM Никита Михайлов 
wrote:

> Thanks for the answer
> NIOFSDirectory is like an example. Degradation is also on
> MMapDirectory and SimpleFSDirectory
>
> We are using elasticseach and it has: simplefs (SimpleFsDirectory),
> niofs (NIOFSDirectory), mmapfs (MMapDirectory) and hybridfs
> (NIOFSDirectory + MMapDirectory). And for us, while niofs was a little
> faster than other stores
>
> Yes FSDirectory works fast(both commits), but now it is difficult to
> test on prod elasticseach.
> But why is FSDirectory fast? How to understand this?
>
> чт, 8 апр. 2021 г. в 13:49, Adrien Grand :
> >
> > Hello,
> >
> > Why are you forcing NIOFSDirectory instead of using Lucene's defaults via
> > FSDirectory#open? I wonder if this might contribute to the slowdown you
> are
> > seeing given that access to the terms index tends to be a bit random.
> >
> > It's very unlikely we'll add back a toggle for this as there is no point
> in
> > holding the terms index in JVM heap when it could live in the OS cache
> > instead.
> >
> > On Thu, Apr 8, 2021 at 7:57 AM Никита Михайлов <
> mihaylovniki...@gmail.com>
> > wrote:
> >
> > > Hi. I noticed that after the upgrade from Lucene8.5.x to Lucene >=8.6,
> > >  search became slower(example TopScoreDocCollector became 20-30%
> slower,
> > > from ElasticSearch - 50%).
> > >
> > > While testing, I realized that it happened after LUCENE-9257(commit
> > > e7a61ea). Bug or feature? Can add settings for isOffHeep? To make the
> > > developer explicitly make this choice
> > >
> > > Added a file that shows a simple demo that the search is slow
> > > Need to run on commit e7a61ea and 90aced5, you will notice how the
> speed
> > > drops to 30%
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> >
> > --
> > Adrien
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien


Re: Slower search after 8.5.x to >=8.6

2021-04-08 Thread Никита Михайлов
Thanks for the answer
NIOFSDirectory is like an example. Degradation is also on
MMapDirectory and SimpleFSDirectory

We are using elasticseach and it has: simplefs (SimpleFsDirectory),
niofs (NIOFSDirectory), mmapfs (MMapDirectory) and hybridfs
(NIOFSDirectory + MMapDirectory). And for us, while niofs was a little
faster than other stores

Yes FSDirectory works fast(both commits), but now it is difficult to
test on prod elasticseach.
But why is FSDirectory fast? How to understand this?

чт, 8 апр. 2021 г. в 13:49, Adrien Grand :
>
> Hello,
>
> Why are you forcing NIOFSDirectory instead of using Lucene's defaults via
> FSDirectory#open? I wonder if this might contribute to the slowdown you are
> seeing given that access to the terms index tends to be a bit random.
>
> It's very unlikely we'll add back a toggle for this as there is no point in
> holding the terms index in JVM heap when it could live in the OS cache
> instead.
>
> On Thu, Apr 8, 2021 at 7:57 AM Никита Михайлов 
> wrote:
>
> > Hi. I noticed that after the upgrade from Lucene8.5.x to Lucene >=8.6,
> >  search became slower(example TopScoreDocCollector became 20-30% slower,
> > from ElasticSearch - 50%).
> >
> > While testing, I realized that it happened after LUCENE-9257(commit
> > e7a61ea). Bug or feature? Can add settings for isOffHeep? To make the
> > developer explicitly make this choice
> >
> > Added a file that shows a simple demo that the search is slow
> > Need to run on commit e7a61ea and 90aced5, you will notice how the speed
> > drops to 30%
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> --
> Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Slower fetch document after upgrade >=8.7

2021-04-08 Thread Adrien Grand
Actually, we don't plan to have flexible settings even for advanced
developers. Our stance on these discussions is that we should be
opinionated about the default codec and not offer any options. Rather than
exposing advanced settings for advanced users, these advanced users can
build their own codec and take care of backward compatibility themselves.

On Thu, Apr 8, 2021 at 10:11 AM Никита Михайлов 
wrote:

> Thanks for the reply.
>
> The problem of understanding. You can make flexible settings for
> advanced developers, leaving two facets by default. In tests, check
> these facets
> Never change them so that the developers themselves explicitly set the
> settings. IMHO, I think this will help to avoid such problems
>
> OK. Have a ticket?
>
> чт, 8 апр. 2021 г. в 13:52, Adrien Grand :
> >
> > Thanks for the feedback.
> >
> > We don't want to offer too many choices, as it complicates backward
> > compatibility testing, and want to stick to two options at most.
> >
> > Since this is the second time I'm seeing this feedback, I'm inclined to
> > reduce the block size for BEST_SPEED in order to trade a bit of
> compression
> > ratio for better decompression speed. I have some changes to stored
> fields
> > on my plate, I'll include this change as well.
> >
> > On Thu, Apr 8, 2021 at 7:04 AM Никита Михайлов <
> mihaylovniki...@gmail.com>
> > wrote:
> >
> > > Hi
> > > BEST_SPEED has been changed in LUCENE-9447 and LUCENE-9486. For this
> > > reason, retrieving data from elasticsearch has slowed down by 10-20%.
> When
> > > there is a lot of data, this is critical
> > > Can developers leave the choice of which codec to use: LZ4(16kB) (old
> > > BEST_SPEED) or LZ4 with preset dict(BEST_SPEED_SAVING_DISKSIZE)? Or
> make
> > > more flexible settings?
> > >
> > > Otherwise, such changes may be a blocker or will have to spend money on
> > > buying new hardware
> > >
> >
> >
> > --
> > Adrien
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien


Re: Slower fetch document after upgrade >=8.7

2021-04-08 Thread Никита Михайлов
Thanks for the reply.

The problem of understanding. You can make flexible settings for
advanced developers, leaving two facets by default. In tests, check
these facets
Never change them so that the developers themselves explicitly set the
settings. IMHO, I think this will help to avoid such problems

OK. Have a ticket?

чт, 8 апр. 2021 г. в 13:52, Adrien Grand :
>
> Thanks for the feedback.
>
> We don't want to offer too many choices, as it complicates backward
> compatibility testing, and want to stick to two options at most.
>
> Since this is the second time I'm seeing this feedback, I'm inclined to
> reduce the block size for BEST_SPEED in order to trade a bit of compression
> ratio for better decompression speed. I have some changes to stored fields
> on my plate, I'll include this change as well.
>
> On Thu, Apr 8, 2021 at 7:04 AM Никита Михайлов 
> wrote:
>
> > Hi
> > BEST_SPEED has been changed in LUCENE-9447 and LUCENE-9486. For this
> > reason, retrieving data from elasticsearch has slowed down by 10-20%. When
> > there is a lot of data, this is critical
> > Can developers leave the choice of which codec to use: LZ4(16kB) (old
> > BEST_SPEED) or LZ4 with preset dict(BEST_SPEED_SAVING_DISKSIZE)? Or make
> > more flexible settings?
> >
> > Otherwise, such changes may be a blocker or will have to spend money on
> > buying new hardware
> >
>
>
> --
> Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org