Re: Solr and commits

2020-08-13 Thread Erick Erickson
Here’s a long explanation:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

short explanation:
1> yes
2> when the hard commit setting in solrconfig.xml kicks in, regardless the 
openSearcher setting.

Best,
Erick

> On Aug 13, 2020, at 1:43 AM, Jayadevan Maymala  
> wrote:
> 
> Hi all,
> 
> A few doubts about commits.
> 
> 1)If no commit parameters are passed from a client (solarium) update, will
> the autoSoftCommit values automatically work?
> 2) When we are not committing from the client, when will the data actually
> be flushed to disk?
> 
> Regards,
> Jayadevan



Solr and commits

2020-08-12 Thread Jayadevan Maymala
Hi all,

A few doubts about commits.

1)If no commit parameters are passed from a client (solarium) update, will
the autoSoftCommit values automatically work?
2) When we are not committing from the client, when will the data actually
be flushed to disk?

Regards,
Jayadevan


Re: Solr soft commits

2018-05-11 Thread Shawn Heisey

On 5/10/2018 8:28 PM, Shivam Omar wrote:

Thanks Shawn, So there are cases when soft commit will not be faster than the 
hard commit with openSearcher=true. We have a case where we have to do bulk 
deletions in that case will soft commit be faster than hard commits.


I actually have no idea whether deletions get put in memory by the 
NRTCachingDirectory or not.  If they don't, then soft commits with 
deletes would have no performance advantages over hard commits.  
Somebody who knows the Lucene code REALLY well will need to comment here.



Does it mean post crossing the memory threshold soft commits will lead lucene 
to flush data to disk as in hard commit. Also does a soft commit has a query 
time performance cost than doing a hard commit.


If the machine has enough memory to effectively cache the index, then a 
query after a hard commit should be just as fast as a query after a soft 
commit.  When Solr must actually read the disk to process a query, 
that's when things get slow.  If the machine has enough memory (not 
assigned to any program) for effective disk caching, then the data it 
needs to process a query will be in memory regardless of what kind of 
commit is done.


Thanks,
Shawn



Re: Solr soft commits

2018-05-10 Thread Mark Miller
A soft commit does not control merging. The IndexWriter controls merging
and hard commits go through the IndexWriter. A soft commit tells Solr to
try and open a new SolrIndexSearcher with the latest view of the index. It
does this with a mix of using the on disk index and talking to the
IndexWriter to see updates that have not been committed.

Opening a new SolrIndexSearcher using the IndexWriter this way does have a
cost. You may flush segments, you may apply deletes, you may have to
rebuild partial or full in memory data structures. It's generally much
faster than a hard commit to get a refreshed view of the index though.

Given how SolrCloud was designed, it's usually best to set an auto hard
commit to something that works for you, given how large it will make tlogs
(affecting recovery times), and how much RAM is used. Then use soft commits
for visibility. It's best to use them as infrequently as your use case
allows.

- Mark

On Thu, May 10, 2018 at 10:49 AM Shivam Omar <shivam.o...@jeevansathi.com>
wrote:

> Hi,
>
> I need some help in understanding solr soft commits.  As soft commits are
> about visibility and are fast in nature. They are advised for nrt use
> cases. I want to understand does soft commit also honor merge policies and
> do segment merging for docs in memory. For example, in case, I keep hard
> commit interval very high and allow few million documents to be in memory
> by using soft commit with no hard commit, can it affect solr query time
> performance.
>
>
> Shivam
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> DISCLAIMER
> This email and any files transmitted with it are intended solely for the
> person or the entity to whom they are addressed and may contain information
> which is Confidential and Privileged. Any misuse of the information
> contained in this email, including but not limited to retransmission or
> dissemination of the said information by person or entities other than the
> intended recipient is unauthorized and strictly prohibited. If you are not
> the intended recipient of this email, please delete this email and contact
> the sender immediately.
>
-- 
- Mark
about.me/markrmiller


Re: Solr soft commits

2018-05-10 Thread Shivam Omar


From: Shawn Heisey
Sent: Thursday, May 10, 9:43 PM
Subject: Re: Solr soft commits
To: solr-user@lucene.apache.org


On 5/10/2018 9:48 AM, Shivam Omar wrote: > I need some help in understanding 
solr soft commits. As soft commits are about visibility and are fast in nature. 
They are advised for nrt use cases. Soft commits *MIGHT* be faster than hard 
commits.  There are situations where the performance of a soft commit and a 
hard commit with openSearcher=true will be about the same, particularly if 
indexing is very heavy.

Thanks Shawn, So there are cases when soft commit will not be faster than the 
hard commit with openSearcher=true. We have a case where we have to do bulk 
deletions in that case will soft commit be faster than hard commits.

> I want to understand does soft commit also honor merge policies and do 
> segment merging for docs in memory. For example, in case, I keep hard commit 
> interval very high and allow few million documents to be in memory by using 
> soft commit with no hard commit, can it affect solr query time performance. 
> Segments in memory are very likely not eligible for merging, but I do not 
> actually know whether that is the case. Using soft commits will NOT keep 
> millions of documents in memory.  Solr uses the NRTCachingDirectoryFactory 
> from Lucene by default, and uses it with default values, which are far too 
> low to accommodate millions of documents.  See the Javadoc for the directory 
> to see what those defaults are: 
> https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html
>  That page shows a directory creation with memory values of 5 and 60 MB, but 
> the defaults in the factory code (which is what Solr normally uses) are 4 and 
> 48.  I'm pretty sure that you can increase these values in solrconfig.xml, 
> but really large values are not recommended.  Large enough values to 
> accommodate millions of documents would require the Java heap to also be 
> large, likely with no real performance advantage. If segment sizes exceed 
> these values, then they will not be cached in memory.  Older segments and 
> segments that do not meet the size requirements are flushed to disk.

Does it mean post crossing the memory threshold soft commits will lead lucene 
to flush data to disk as in hard commit. Also does a soft commit has a query 
time performance cost than doing a hard commit.

Thanks, Shawn

DISCLAIMER
This email and any files transmitted with it are intended solely for the person 
or the entity to whom they are addressed and may contain information which is 
Confidential and Privileged. Any misuse of the information contained in this 
email, including but not limited to retransmission or dissemination of the said 
information by person or entities other than the intended recipient is 
unauthorized and strictly prohibited. If you are not the intended recipient of 
this email, please delete this email and contact the sender immediately.


Re: Solr soft commits

2018-05-10 Thread Shawn Heisey

On 5/10/2018 9:48 AM, Shivam Omar wrote:

I need some help in understanding solr soft commits.  As soft commits are about 
visibility and are fast in nature. They are advised for nrt use cases.


Soft commits *MIGHT* be faster than hard commits.  There are situations 
where the performance of a soft commit and a hard commit with 
openSearcher=true will be about the same, particularly if indexing is 
very heavy.



I want to understand does soft commit also honor merge policies and do segment 
merging for docs in memory. For example, in case, I keep hard commit interval 
very high and allow few million documents to be in memory by using soft commit 
with no hard commit, can it affect solr query time performance.


Segments in memory are very likely not eligible for merging, but I do 
not actually know whether that is the case.


Using soft commits will NOT keep millions of documents in memory.  Solr 
uses the NRTCachingDirectoryFactory from Lucene by default, and uses it 
with default values, which are far too low to accommodate millions of 
documents.  See the Javadoc for the directory to see what those defaults 
are:


https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html

That page shows a directory creation with memory values of 5 and 60 MB, 
but the defaults in the factory code (which is what Solr normally uses) 
are 4 and 48.  I'm pretty sure that you can increase these values in 
solrconfig.xml, but really large values are not recommended.  Large 
enough values to accommodate millions of documents would require the 
Java heap to also be large, likely with no real performance advantage.


If segment sizes exceed these values, then they will not be cached in 
memory.  Older segments and segments that do not meet the size 
requirements are flushed to disk.


Thanks,
Shawn



Solr soft commits

2018-05-10 Thread Shivam Omar
Hi,

I need some help in understanding solr soft commits.  As soft commits are about 
visibility and are fast in nature. They are advised for nrt use cases. I want 
to understand does soft commit also honor merge policies and do segment merging 
for docs in memory. For example, in case, I keep hard commit interval very high 
and allow few million documents to be in memory by using soft commit with no 
hard commit, can it affect solr query time performance.


Shivam

Get Outlook for Android<https://aka.ms/ghei36>

DISCLAIMER
This email and any files transmitted with it are intended solely for the person 
or the entity to whom they are addressed and may contain information which is 
Confidential and Privileged. Any misuse of the information contained in this 
email, including but not limited to retransmission or dissemination of the said 
information by person or entities other than the intended recipient is 
unauthorized and strictly prohibited. If you are not the intended recipient of 
this email, please delete this email and contact the sender immediately.


Re: Solr Cloud, Commits and Master/Slave configuration

2012-03-01 Thread eks dev
Thanks Mark,
Good, this is probably good enough to give it a try. My analyzers are
normally fast,  doing duplicate analysis  (at each replica) is
probably not going to cost a lot, if there is some decent batching

Can this be somehow controlled (depth of this buffer / time till flush
or some such). Which events trigger this flushing to replicas
(softCommit, commit, something new?)

What I found useful is to always think in terms of incremental (low
latency) and batch (high throughput) updates. I just then need some
knobs to tweak behavior of this update process.

I wold really like to move away from Master/Slave, Cloud makes a lot
of things way simpler for us users ... Will give it a try in a couple
of weeks

Later we can even think about putting replication at segment level for
extremely expensive analysis, batch cases, or initial cluster
seeding as a replication option. But this is then just an
optimization.

Cheers,
eks


On Thu, Mar 1, 2012 at 5:24 AM, Mark Miller markrmil...@gmail.com wrote:
 We actually do currently batch updates - we are being somewhat loose when we 
 say a document at a time. There is a buffer of updates per replica that gets 
 flushed depending on the requests coming through and the buffer size.

 - Mark Miller
 lucidimagination.com

 On Feb 28, 2012, at 3:38 AM, eks dev wrote:

 SolrCluod is going to be great, NRT feature is really huge step
 forward, as well as central configuration, elasticity ...

 The only thing I do not yet understand is treatment of cases that were
 traditionally covered by Master/Slave setup. Batch update

 If I get it right (?), updates to replicas are sent one by one,
 meaning when one server receives update, it gets forwarded to all
 replicas. This is great for reduced update latency case, but I do not
 know how is it implemented if you hit it with batch update. This
 would cause huge amount of update commands going to replicas. Not so
 good for throughput.

 - Master slave does distribution at segment level, (no need to
 replicate analysis, far less network traffic). Good for batch updates
 - SolrCloud does par update command (low latency, but chatty and
 Analysis step is done N_Servers times). Good for incremental updates

 Ideally, some sort of batching is going to be available in
 SolrCloud, and some cont roll over it, e.g. forward batches of 1000
 documents (basically keep update log slightly longer and forward it as
 a batch update command). This would still cause duplicate analysis,
 but would reduce network traffic.

 Please bare in mind, this is more of a question than a statement,  I
 didn't look at the cloud code. It might be I am completely wrong here!





 On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 As I understand it (and I'm just getting into SolrCloud myself), you can
 essentially forget about master/slave stuff. If you're using NRT,
 the soft commit will make the docs visible, you don't ned to do a hard
 commit (unlike the master/slave days). Essentially, the update is sent
 to each shard leader and then fanned out into the replicas for that
 leader. All automatically. Leaders are elected automatically. ZooKeeper
 is used to keep the cluster information.

 Additionally, SolrCloud keeps a transaction log of the updates, and replays
 them if the indexing is interrupted, so you don't risk data loss the way
 you used to.

 There aren't really masters/slaves in the old sense any more, so
 you have to get out of that thought-mode (it's hard, I know).

 The code is under pretty active development, so any feedback is
 valuable

 Best
 Erick

 On Mon, Feb 27, 2012 at 3:26 AM, roz dev rozde...@gmail.com wrote:
 Hi All,

 I am trying to understand features of Solr Cloud, regarding commits and
 scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job 
 of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to 
 make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

 Any inputs are welcome.

 Thanks

 -Saroj














Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-29 Thread Mark Miller
We actually do currently batch updates - we are being somewhat loose when we 
say a document at a time. There is a buffer of updates per replica that gets 
flushed depending on the requests coming through and the buffer size.

- Mark Miller
lucidimagination.com

On Feb 28, 2012, at 3:38 AM, eks dev wrote:

 SolrCluod is going to be great, NRT feature is really huge step
 forward, as well as central configuration, elasticity ...
 
 The only thing I do not yet understand is treatment of cases that were
 traditionally covered by Master/Slave setup. Batch update
 
 If I get it right (?), updates to replicas are sent one by one,
 meaning when one server receives update, it gets forwarded to all
 replicas. This is great for reduced update latency case, but I do not
 know how is it implemented if you hit it with batch update. This
 would cause huge amount of update commands going to replicas. Not so
 good for throughput.
 
 - Master slave does distribution at segment level, (no need to
 replicate analysis, far less network traffic). Good for batch updates
 - SolrCloud does par update command (low latency, but chatty and
 Analysis step is done N_Servers times). Good for incremental updates
 
 Ideally, some sort of batching is going to be available in
 SolrCloud, and some cont roll over it, e.g. forward batches of 1000
 documents (basically keep update log slightly longer and forward it as
 a batch update command). This would still cause duplicate analysis,
 but would reduce network traffic.
 
 Please bare in mind, this is more of a question than a statement,  I
 didn't look at the cloud code. It might be I am completely wrong here!
 
 
 
 
 
 On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 As I understand it (and I'm just getting into SolrCloud myself), you can
 essentially forget about master/slave stuff. If you're using NRT,
 the soft commit will make the docs visible, you don't ned to do a hard
 commit (unlike the master/slave days). Essentially, the update is sent
 to each shard leader and then fanned out into the replicas for that
 leader. All automatically. Leaders are elected automatically. ZooKeeper
 is used to keep the cluster information.
 
 Additionally, SolrCloud keeps a transaction log of the updates, and replays
 them if the indexing is interrupted, so you don't risk data loss the way
 you used to.
 
 There aren't really masters/slaves in the old sense any more, so
 you have to get out of that thought-mode (it's hard, I know).
 
 The code is under pretty active development, so any feedback is
 valuable
 
 Best
 Erick
 
 On Mon, Feb 27, 2012 at 3:26 AM, roz dev rozde...@gmail.com wrote:
 Hi All,
 
 I am trying to understand features of Solr Cloud, regarding commits and
 scaling.
 
 
   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?
 
 
   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?
 
 Any inputs are welcome.
 
 Thanks
 
 -Saroj














Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-28 Thread eks dev
SolrCluod is going to be great, NRT feature is really huge step
forward, as well as central configuration, elasticity ...

The only thing I do not yet understand is treatment of cases that were
traditionally covered by Master/Slave setup. Batch update

If I get it right (?), updates to replicas are sent one by one,
meaning when one server receives update, it gets forwarded to all
replicas. This is great for reduced update latency case, but I do not
know how is it implemented if you hit it with batch update. This
would cause huge amount of update commands going to replicas. Not so
good for throughput.

- Master slave does distribution at segment level, (no need to
replicate analysis, far less network traffic). Good for batch updates
- SolrCloud does par update command (low latency, but chatty and
Analysis step is done N_Servers times). Good for incremental updates

Ideally, some sort of batching is going to be available in
SolrCloud, and some cont roll over it, e.g. forward batches of 1000
documents (basically keep update log slightly longer and forward it as
a batch update command). This would still cause duplicate analysis,
but would reduce network traffic.

Please bare in mind, this is more of a question than a statement,  I
didn't look at the cloud code. It might be I am completely wrong here!





On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson erickerick...@gmail.com wrote:
 As I understand it (and I'm just getting into SolrCloud myself), you can
 essentially forget about master/slave stuff. If you're using NRT,
 the soft commit will make the docs visible, you don't ned to do a hard
 commit (unlike the master/slave days). Essentially, the update is sent
 to each shard leader and then fanned out into the replicas for that
 leader. All automatically. Leaders are elected automatically. ZooKeeper
 is used to keep the cluster information.

 Additionally, SolrCloud keeps a transaction log of the updates, and replays
 them if the indexing is interrupted, so you don't risk data loss the way
 you used to.

 There aren't really masters/slaves in the old sense any more, so
 you have to get out of that thought-mode (it's hard, I know).

 The code is under pretty active development, so any feedback is
 valuable

 Best
 Erick

 On Mon, Feb 27, 2012 at 3:26 AM, roz dev rozde...@gmail.com wrote:
 Hi All,

 I am trying to understand features of Solr Cloud, regarding commits and
 scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

 Any inputs are welcome.

 Thanks

 -Saroj


Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread roz dev
Hi All,

I am trying to understand features of Solr Cloud, regarding commits and
scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

Any inputs are welcome.

Thanks

-Saroj


Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread Erick Erickson
As I understand it (and I'm just getting into SolrCloud myself), you can
essentially forget about master/slave stuff. If you're using NRT,
the soft commit will make the docs visible, you don't ned to do a hard
commit (unlike the master/slave days). Essentially, the update is sent
to each shard leader and then fanned out into the replicas for that
leader. All automatically. Leaders are elected automatically. ZooKeeper
is used to keep the cluster information.

Additionally, SolrCloud keeps a transaction log of the updates, and replays
them if the indexing is interrupted, so you don't risk data loss the way
you used to.

There aren't really masters/slaves in the old sense any more, so
you have to get out of that thought-mode (it's hard, I know).

The code is under pretty active development, so any feedback is
valuable

Best
Erick

On Mon, Feb 27, 2012 at 3:26 AM, roz dev rozde...@gmail.com wrote:
 Hi All,

 I am trying to understand features of Solr Cloud, regarding commits and
 scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

 Any inputs are welcome.

 Thanks

 -Saroj