We are using Solr Cloud 4.10.3-cdh5.4.5 that is part of CLoudera CDH 5.4.5.
Our collection (one shard with three replicas) became really big and we
decided to delete some old records to improve performance (tests in staging
environment have shown that after reaching 500 million records the index
becomes very slow and Solr is less responsive). After deleting about 100
million records (out of 260 mil.), they were still shown as "Deleted Docs'
in Solr Admin Statistics page. This page was showing 'Optimized: No (red)'
and 'Current: No (red)'. Theoretically, having 100 million deleted (but not
removed) records would be a performance issue and also, people tend to have
Information found in Solr forums was that the only way to removed deleted
records is to optimize the index.
We knew that optimization is not a good idea and it was discussed in forums
that it should be completely removed from API and Solr Admin, but discussing
is one thing and doing it is another. To make the story short, we tried to
optimize through Solr API to remove deleted records:
and all three replicas of the collection were merged to 18 segments and Solr
Admin was showing "Optimized: Yes (green)", but the deleted records were not
removed (which is an inconsistency with Solr Admin or a bug in the API).
Finally, because people usually trust features fuond in UI (even if official
documentation is not found, see
the "Optimize Now" button in Solr Admin was pressed and it removed all
deleted records and made the collection look very good (in UI). Here is the
1. The index was reduced to one large (60 GB) segment (some people's opinion
is that it is good, but I doubt).
2. Our use case includes batch updates and then a soft commit (after which
the user sees results). Commit operation that was taking about 1.5 minutes
now takes from 12 to 25 minutes.
Overall performance of our application is severely degraded.
I am not going to talk about how confusing Solr optimization is, but I am
asking if anyone knows *what caused slowness of the commit operation after
optimization*. If the issue is having a large segment, then how is it
possible to split this segment into smaller ones (without sharding)?
View this message in context:
Sent from the Solr - User mailing list archive at Nabble.com.