Re: Commits and new document visibility

2019-03-14 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 3/14/19 10:46, Shawn Heisey wrote:
> On 3/14/2019 8:23 AM, Christopher Schultz wrote:
>> I believe that the only thing I want to do is to set the 
>> autoSoftCommit value to something "reasonable". I'll probably
>> start with maybe 15000 (15sec) to match the hard-commit setting
>> and see if we get any complaints about delays between "save" and
>> "seeing the user".
> 
> In my opinion, 15 seconds is far too frequent for opening a new 
> searcher.  If the index reaches any real size, you may be in a
> situation where the full soft commit takes longer than 15 seconds
> to complete - mostly due to warming or autowarming.  Commits that
> open a searcher can be very resource-intensive ... if they happen
> too frequently, then heavy indexing will cause your Solr instance
> to never "calm down" ... it will always be hitting the CPU and disk
> hard. I'd personally start with one minute and adjust from there
> based on how long the commits take.
Okay. Current core size is ~1M documents. I think users can live with
a 1-minute delay, but I'll have to ask :)

Is the log file the best resource for information on (soft)
commit-duration?

>> In our case, we don't have a huge number of documents being
>> created in a minute. Probably once per minute, if that.
>> 
>> Does that seem reasonable?
>> 
>> As for actually SETTING the setting, I'd prefer not to edit the 
>> solrconfig.xml document. Instead, can I set this in my
>> solr.in.sh script? I see an example like this right in the file:
>> 
>> SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"
> 
> 3 seconds is even more problematic than 15.

Sorry, that was just a copy/paste directly from the default solr.in.sh
script that ships with Solr. I wouldn't do a 3-second soft-commit.

> I believe that when you use "bin/solr create" to create an index
> with the default config, that it does set the autoSoftCommit to 3
> seconds. Which as I stated, I believe to be far too frequent.

Nope, it sets it to "never soft commit", unless the defaults have
changed since I built this service with, I think, 7.3.0.

Is there any way to change this value at runtime, or does it require a
service-restart?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKxt8ACgkQHPApP6U8
pFg9ChAAkSgsvn3+xufyLM9bA8WIWqICwmDWRdFM9nbSiy4bDH1Zl/86FKjzcvbB
lmyVFYlpFGedcSKLVsqXGEZiu8n0YgR6iVw6udfIJOWzex5JkwUBUsmS6bHP5ZAj
8wkTyWPyBQVBSBWUxQnEzfrgJCFxzEbzBt8no0gt0f7vbgXm+HaFBkb+l2MQzTK9
wrhsLh36cb17ig+/w16Eo4Rq5VQ5f/P4Y7PkTfzS5CaWyPi16mTP8Z7vTxQ+ltHQ
IPAVnZ4U6Tx4hFxf2Ox99qRX5wAlX0lMD063Gx7Q348Xn+u8VH8Aur8hudnb9Icf
MK9OqU0bxdeWkhDxGDCuxY4h+t+kE1YI0cPI5KWTkBVAU24dCOAPkJQ0LMGs/rGR
B3KareFltLztowvM8rxOeNcLzeoKn1ZpWrtPuK9tuaCy9LnwxgfTOGJFRuzhzxPF
WHA7R4LtQrjjmAXV1a/BgkNVXXmGnq1qJNyICiV6nYS/ALJXKidrexgcyJ4FoWK4
uEcy/62mtbTVz7I4mdmkNH/vwjjOTxZy2FXfwoUIQYe9R2RHM9NbF0Fzzrvx3hQH
vp2GD+AhzhIQUuqBe50XqUkC0T199ZgR4YkCBX7LdPDPcv54QgAfgjfImidQAiqn
s+i/J/rBFZPTD2vAgix+A74UNpePrKhODt0GNg92J4NvTU8P9kM=
=FwiA
-END PGP SIGNATURE-


Re: Commits and new document visibility

2019-03-14 Thread Shawn Heisey

On 3/14/2019 8:23 AM, Christopher Schultz wrote:

I believe that the only thing I want to do is to set the
autoSoftCommit value to something "reasonable". I'll probably start
with maybe 15000 (15sec) to match the hard-commit setting and see if
we get any complaints about delays between "save" and "seeing the user".


In my opinion, 15 seconds is far too frequent for opening a new 
searcher.  If the index reaches any real size, you may be in a situation 
where the full soft commit takes longer than 15 seconds to complete - 
mostly due to warming or autowarming.  Commits that open a searcher can 
be very resource-intensive ... if they happen too frequently, then heavy 
indexing will cause your Solr instance to never "calm down" ... it will 
always be hitting the CPU and disk hard.


I'd personally start with one minute and adjust from there based on how 
long the commits take.



In our case, we don't have a huge number of documents being created in
  a minute. Probably once per minute, if that.

Does that seem reasonable?

As for actually SETTING the setting, I'd prefer not to edit the
solrconfig.xml document. Instead, can I set this in my solr.in.sh
script? I see an example like this right in the file:

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"


3 seconds is even more problematic than 15.

I believe that when you use "bin/solr create" to create an index with 
the default config, that it does set the autoSoftCommit to 3 seconds. 
Which as I stated, I believe to be far too frequent.


Thanks,
Shawn


Commits and new document visibility

2019-03-14 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I recently had a situation where a document wasn't findable in a
fairly small Solr core/collection and I didn't see any errors in
either the application using Solr or within Solr itself. A Solr
service restart caused the document to become visible.

So I started reading.

I believe the "problem" is that the document was indexed but not
visible due to the default commit settings in Solr 7.5 -- which is the
version  I happen to be running right now.

I never bothered so change anything from the defaults because, well, I
didn't know what I was doing. Now that I (a) have a problem to solve
and (b) know a little more about what is happening, I just wanted a
quick sanity-check on what I'd like to do.

[Quick background: my core/collection stores user data so that other
users can quickly find anyone in the system via text-search. This
replaced our previous RDBMS-based "SELECT ... WHERE name LIKE
'%whatever%'" implementation which of course wasn't scaling well.
Generally, users will expect that when a new user is created, they
will be findable "fairly soon" (probably immediately) afterwards.]

We are using SolrJ as a client from our application, btw.

Initially, we were doing:

SolrInputDocument document = ...;
SolrClient solr = ...;
solr.add(document);
solr.commit();

Someone told me that committing after every document-add was wasteful
and it seemed like good advice -- allow Solr's autoCommit mechanism to
handle the commits and we'll get better performance. The problem was
that no new documents are visible unless we take additional action.

So, here's the default settings:

autoCommit   = max 15sec
openSearcher = false

autoSoftCommit = never[*]

This means that every 15 seconds (plus OS/disk sync time), I'll get a
safe snapshot of the data. I'm okay with losing 15 seconds worth of
data if there is some catastrophe.

It also means that my documents are pretty much never made visible.

I believe that the only thing I want to do is to set the
autoSoftCommit value to something "reasonable". I'll probably start
with maybe 15000 (15sec) to match the hard-commit setting and see if
we get any complaints about delays between "save" and "seeing the user".

In our case, we don't have a huge number of documents being created in
 a minute. Probably once per minute, if that.

Does that seem reasonable?

As for actually SETTING the setting, I'd prefer not to edit the
solrconfig.xml document. Instead, can I set this in my solr.in.sh
script? I see an example like this right in the file:

SOLR_OPTS="$SOLR_OPTS -Dsolr.autoSoftCommit.maxTime=3000"

Is that a fairly standard way to set the autoSoftCommit value for all
cores?

Thanks,
- -chris

[*] This setting is documented only in a single place: in the
"near-real-time" documentation. It would be nice if that special value
was called-out in other places so it wasn't so hard to find.
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlyKY9wACgkQHPApP6U8
pFhxzRAAnxLCMPSFwJxChXZ8q7UJ9hHAGyMPHNs3k0tFilt9/aT+eR7rUEFGupvR
anl+o7QNU8fOreF/l0KoFeGpjNLHZqEJRSKrZkaEb0PH3gabH5IKpgwY9hr+CS9N
bcKC7GwQAs19TdkTorxY+MIBeQo0/bO51Ux7XallzYPdX6BW/+kRGlHCuiAQj3fg
+EwQan0iXLslk/bDxvCvg95B1zlvr7R4iRAOwp9GxIsk4tL8X/B7sOS5pm0RK19/
tiVJuAqTBwD2fQ3lZ1oQftadKMuajgedJdrrgd94jCuwzWVLjJpIXql2AKA/QcsM
7e2zJqOsPy/4eGFUJ+St5/JYxFfm/yzFjV4rTW1/wng65mmbYAGpLsQ3A+05A8s1
o8ciDQ/80/fvnislr3/NGxZF5hSMjJG4xVriDWpdHX+PqfbqfpeaWnR4j8HEP3vy
tPklo3MflnPLk0oA6wqvjSX32ujucVd+X5tKKtkqnE6rorD41FpJGVRvgUrq7Zof
kwNro/r7ObqD72hioJJIkjol3ImL3NGSyeZ6XZtsKx+kEsGoyvW5lsRtC580ksXN
tYaJbCWQbrHmXnf3ooQV0PatQi0YkG70BQceKPXNQJ3l8Fmc2MjrP7aJ9//ptrMl
Pvc0qh4mpzGJKMBjSjaItadmouZdc3dn308xP4WIvpt2a4RYmjo=
=PrAt
-END PGP SIGNATURE-