Yes, that would be great.

Thanks

On Fri, Dec 14, 2018 at 5:38 PM Edward Ribeiro <edward.ribe...@gmail.com>
wrote:

> Indeed! It clarified a lot, thank you. :) Now I know I messed with the
> reload core config, but the other aspects were more or less what I have
> been expecting.
>
> Do you think it's worth to submit a PR to the Reference Guide with those
> explanations? I can take a stab at it.
>
> Regards,
> Edward
>
> On Fri, Dec 14, 2018 at 3:08 AM Tomás Fernández Löbbe <
> tomasflo...@gmail.com>
> wrote:
>
> > > >
> > > > No, I am not seeing reloads.
> >
> > Ah, good.
> >
> >
> > > > I am trying to understand the interactions
> > > > between hard commit, soft commit, transaction log update with a TLOG
> > > > cluster for both leader and follower replicas. For example, after
> > getting
> > > > new segments from the leader the follower replica will still apply
> the
> > > > hard/soft commit?
> > >
> >
> > Think about the hard commit as a flush of the latest updates to a segment
> > plus checkpoint pointing to all the current valid segments. That
> checkpoint
> > is also a file. The soft commit is similar to the hard commit in the
> sense
> > that it creates a segment and a pointer to the valid segments, however,
> > those segments may not be flushed to disk yet, and the checkpoint is not
> on
> > a file. *In addition* to creating segments, the commits in Solr create
> > searchers to get the latest view of the index (hard-commits only when
> > openSearcher=true and soft-commits always), but that doesn't really
> matter
> > in the context of replication.
> >
> > The follower replica (a TLOG/PULL) will ask the leader for the last hard
> > commit and replicate all the segments and the file indicating the commit.
> > All the TLOG/PULL replica does after it replicates is open a searcher
> with
> > all the segments in that checkpoint. Two important notes here: 1) the
> > follower replica doesn't "perform" a commit, it copied it from the leader
> > and 2) this "open a searcher" is not a soft/hard commit, is just opening
> a
> > searcher (a "commit" usually involves creating segments).
> >
> > * If in the leader (a TLOG replica) you do a soft commit, it'll never
> make
> > it to the follower, because the follower only replicates the latest hard
> > commit (see ReplicationHandler.indexCommitPoint).
> > * If in the follower (a TLOG replica) you do a soft commit, it won't do
> any
> > difference, because in the TLOG case, documents are not added to the
> index
> > (only to the transaction log). (See UpdateCommand.IGNORE_INDEXWRITER
> flag)
> > * If in the follower (a PULL replica) you do a soft commit, it also
> > wouldn't do any difference, because it doesn't receive the documents
> anyway
> > (only replicates). Commit is skipped anyway (see
> > DistributedUpdateProcessor.processCommit)
> >
> > The transaction log is only used for recovery purposes (or realtime get).
> >
> > I hope that clarifies things.
> >
> > >
> > > > PS: congratulations on the Berlin Buzzwords' talk. :)
> > >
> > Thanks!
> >
> > > >
> > > > Thanks!
> > > >
> > > > On Mon, Dec 10, 2018 at 9:24 PM Tomás Fernández Löbbe
> > > > <tomasflo...@gmail.com>
> > > > wrote:
> > > >
> > > > > I think this is a good point. The tricky part is that if TLOG
> > replicas
> > > > > don't replicate often, their transaction logs will get too big too,
> > so
> > > you
> > > > > want the replication interval of TLOG replicas to be tied to the
> > > > > auto(hard)Commit interval (by default at least). If you are using
> > them
> > > for
> > > > > search, you may also not want to open a searcher for each fetch...
> > for
> > > PULL
> > > > > replicas, maybe the best way is to use the autoSoftCommit interval
> to
> > > > > define the polling interval. That said, I'm not sure using
> different
> > > > > configurations is a good idea, some people may be mixing TLOG and
> > PULL
> > > > and
> > > > > querying them both alike.
> > > > >
> > > > > In the meantime, if you have different hosts for TLOG and PULL
> > > replicas,
> > > > > one workaround you can have is to define the autoCommit time with a
> > > > system
> > > > > property, and use different properties for TLOGs vs PULL nodes.
> > > > >
> > > > > > There is no commit on TLOG/PULL  follower replicas, only on the
> > > leader.
> > > > > > Followers fetch the segments and **reload the core** every 150
> > > seconds
> > > > >
> > > > > Edward, "reload" shouldn't really happen in regular TLOG/PULL
> > fetches.
> > > Are
> > > > > you seeing reloads?
> > > > >
> > > > > On Mon, Dec 10, 2018 at 4:41 PM Erick Erickson <
> > > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > bq. but not every poll attempt they fetch new segment from the
> > leader
> > > > > >
> > > > > > Ah, right. Ignore my comment. Commit will only occur on the
> > followers
> > > > > > when there are new segments to pull down, so your'e right,
> roughly
> > > > > > every second poll would commit find things to bring down and
> open a
> > > > > > new searcher.........
> > > > > > On Sun, Dec 9, 2018 at 4:14 PM Edward Ribeiro
> > > > <edward.ribe...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Vadim,
> > > > > > >
> > > > > > > There is no commit on TLOG/PULL  follower replicas, only on the
> > > leader.
> > > > > > > Followers fetch the segments and **reload the core** every 150
> > > seconds
> > > > > > (if
> > > > > > > there were new segments, I suppose). Yeah, followers don't pay
> > the
> > > CPU
> > > > > > > price of indexing, but there are still cache invalidation,
> > > autowarming,
> > > > > > > etc, in addition to network and IO demand. Is that ritht,
> Erick?
> > > > > > >
> > > > > > > Besides that, Erick is pointing out that under a heavy indexing
> > > > > workload
> > > > > > > you could either have:
> > > > > > >
> > > > > > > 1. Very large transaction logs;
> > > > > > >
> > > > > > > 2. Very large numbers of segments. If that is the case, you
> could
> > > have
> > > > > > the
> > > > > > > following scenario numerous times:
> > > > > > >    2.1. follower replica downloads segment A and B from leader;
> > > > > > >    2.2 leader merges segments A + B into C;
> > > > > > >    2.3. follower replicas discard A and B and download C on
> next
> > > poll;
> > > > > > >
> > > > > > > Under the second condition followers needlessly downloaded
> > segments
> > > > > that
> > > > > > > would eventually be merged.
> > > > > > >
> > > > > > > IMO, you should carefully evaluate if the use of TLOG/PULL is
> > > really
> > > > > > > recommended for your cluster setup, plus indexing and querying
> > > > > workload.
> > > > > > > You can very much stay with a NRT setup if it suits you better.
> > The
> > > > > > videos
> > > > > > > below provide a nice set of hints for when to choose between
> NRT
> > or
> > > > > some
> > > > > > > combination of TLOG and PULL.
> > > > > > >
> > > > > > > https://youtu.be/XIb8X3MwVKc
> > > > > > >
> > > > > > > https://youtu.be/dkWy2ykzAv0
> > > > > > >
> > > > > > > https://youtu.be/XqfTjd9KDWU
> > > > > > >
> > > > > > > Regards,
> > > > > > > Edward
> > > > > > >
> > > > > > > Em dom, 9 de dez de 2018 16:56, <
> > vadim.iva...@spb.ntk-intourist.ru
> > > > > > escreveu:
> > > > > > >
> > > > > > > >
> > > > > > > >  If hard commit max time is 300 sec then commit happens every
> > 300
> > > > sec
> > > > > > on
> > > > > > > > tlog leader. And new segments pop up on the leader every 300
> > sec,
> > > > > > during
> > > > > > > > indexing. Polling interval on other replicas 150 sec, but not
> > > every
> > > > > > poll
> > > > > > > > attempt they fetch new segment from the leader, afaiu. Erick,
> > do
> > > you
> > > > > > mean
> > > > > > > > that on all other  tlog replicas(not leaders) commit occurs
> > every
> > > > > poll?
> > > > > > > > воскресенье, 09 декабря 2018г., 19:21 +03:00 от Erick
> Erickson
> > > > > > > > erickerick...@gmail.com :
> > > > > > > >
> > > > > > > > >Not quite, 600000. The polling interval is half the commit
> > > > > > interval....
> > > > > > > > >
> > > > > > > > >This has always bothered me a little bit, I wonder at the
> > > utility
> > > > > of a
> > > > > > > > >config param. We already have old-style replication with a
> > > > > > > > >configurable polling interval. Under very heavy indexing
> > loads,
> > > it
> > > > > > > > >seems to me that either the tlogs will grow quite large or
> > > we'll be
> > > > > > > > >pulling a lot of unnecessary segments across the wire,
> > segments
> > > > > > > > >that'll soon be merged away and the merged segment
> re-pulled.
> > > > > > > > >
> > > > > > > > >Apparently, though, nobody's seen this "in the wild", so
> it's
> > > > > > > > >theoretical at this point.
> > > > > > > > >On Sun, Dec 9, 2018 at 1:48 AM Vadim Ivanov
> > > > > > > > < vadim.iva...@spb.ntk-intourist.ru> wrote:
> > > > > > > > >
> > > > > > > > > Thanks, Edward, for clues.
> > > > > > > > > What bothers me is newSearcher start, warming, cache
> clear...
> > > all
> > > > > > that
> > > > > > > > CPU consuming stuff in my heavy-indexing scenario.
> > > > > > > > > With NRT I had autoSoftCommit:  300000 .
> > > > > > > > > So I had new Searcher no more than  every 5 min on every
> > > replica.
> > > > > > > > > To have more or less  the same effect with TLOG - PULL
> > > collection,
> > > > > > > > > I suppose, I have to have  :  300000
> > > > > > > > > (yes, I understand that newSearchers start asynchronously
> on
> > > leader
> > > > > > and
> > > > > > > > replicas)
> > > > > > > > > Am I right?
> > > > > > > > > --
> > > > > > > > > Vadim
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> -----Original Message-----
> > > > > > > > >> From: Edward Ribeiro [mailto:edward.ribe...@gmail.com]
> > > > > > > > >> Sent: Sunday, December 09, 2018 12:42 AM
> > > > > > > > >> To:  solr-user@lucene.apache.org
> > > > > > > > >> Subject: Re: Soft commit and new replica types
> > > > > > > > >>
> > > > > > > > >> Some insights in the new replica types below:
> > > > > > > > >>
> > > > > > > > >> On Sat, December 8, 2018 08:42, Vadim Ivanov <
> > > > > > > > >> vadim.iva...@spb.ntk-intourist.ru wrote:
> > > > > > > > >>
> > > > > > > > >>>
> > > > > > > > >>> From Ref guide we have:
> > > > > > > > >>> " NRT is the only type of replica that supports
> > > soft-commits..."
> > > > > > > > >>> "If TLOG replica does become a leader, it will behave the
> > > same as
> > > > > > if it
> > > > > > > > >>> was a NRT type of replica."
> > > > > > > > >>> Does it mean, that if we do not have NRT replicas in the
> > > cluster
> > > > > > then
> > > > > > > > >>> autoSoftCommit section in solconfig.xml Ignored
> completely
> > > (even
> > > > > on
> > > > > > > > TLOG
> > > > > > > > >>> leader)?
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >> No, not completely. Both TLOG and PULL nodes will
> > periodically
> > > > > poll
> > > > > > the
> > > > > > > > >> leader for changes in index segments' files and download
> > those
> > > > > > segments
> > > > > > > > >> from the leader. If hard commit max time is defined in
> > > > > > solrconfig.xml
> > > > > > > > the
> > > > > > > > >> polling interval of each replica will be half that value.
> Or
> > > else
> > > > > > if the
> > > > > > > > >> soft commit max time is defined then the replicas will use
> > > half
> > > > > the
> > > > > > soft
> > > > > > > > >> commit max time as the interval. If neither are defined
> then
> > > the
> > > > > > poll
> > > > > > > > >> interval will be 3 seconds (hard coded). See here:
> > > > > > > > >> https://github.com/apache/lucene-
> > > > > > > > >>
> > > > > >
> > > >
> solr/blob/75b183196798232aa6f2dcaaaab117f309119053/solr/core/src/java/o
> > > > > > > > >> rg/apache/solr/cloud/ReplicateFromLeader.java#L68-L77
> > > > > > > > >>
> > > > > > > > >> If the TLOG is the leader it will index locally and append
> > > the doc
> > > > > > to
> > > > > > > > >> transaction log as a NRT node would do as well as it will
> > > > > > synchronously
> > > > > > > > >> replicate the data to other TLOG replicas' transaction
> logs
> > > (PULL
> > > > > > nodes
> > > > > > > > >> don't have transaction logs). But TLOG/PULL replicas
> doesn't
> > > > > support
> > > > > > > > soft
> > > > > > > > >> commits nor real time gets, afaik.
> > > > > > > > >>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>>
> > > > > > > > >>> 60000
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> Should we say that in autoCommit section openSearcher is
> > > always
> > > > > > true in
> > > > > > > > >>> that case?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> 10000
> > > > > > > > >> 30000
> > > > > > > > >> 512m
> > > > > > > > >> false
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Does it mean that new Searcher always starts on all
> replicas
> > > when
> > > > > > hard
> > > > > > > > >> commit happens on leader?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Nope. Or at least, the searcher is not synchronously
> > created.
> > > Each
> > > > > > non
> > > > > > > > >> leader replica will periodically fetch the index changes
> > from
> > > the
> > > > > > leader
> > > > > > > > >> and open a new searcher to reflect those changes as seen
> > here:
> > > > > > > > >> https://github.com/apache/lucene-
> > > > > > > > >>
> > > > > >
> > > >
> solr/blob/75b183196798232aa6f2dcaaaab117f309119053/solr/core/src/java/o
> > > > > > > > >> rg/apache/solr/handler/IndexFetcher.java#L653
> > > > > > > > >> But it's important to note that the potential delay
> between
> > > the
> > > > > > leader's
> > > > > > > > >> hard commit and the other replicas fetching those changes
> > > from the
> > > > > > > > leader
> > > > > > > > >> and opening a new searcher to reflect latest changes.
> > > > > > > > >>
> > > > > > > > >> PS: I am still digging these new replica types so I can
> have
> > > > > > > > misunderstood
> > > > > > > > >> or missed some aspect of it.
> > > > > > > > >>
> > > > > > > > >> Regards,
> > > > > > > > >> Edward
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> > >
> >
>

Reply via email to