On 7/6/22 04:32, Michał Świątkowski wrote:
I checked that and collection data will be erased only when I will use
clean=true and optimize=true (first query).
1. clean=true ; optimize=true
webapp=/solr path=/dataimport
params={core=example_collection&optimize=true&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=1657098443936&verbose=true}
status=0 QTime=5
2. clean=true
webapp=/solr path=/dataimport
params={core=example_collection&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=1657098443936&verbose=true}
status=0 QTime=4
3. clean=false ; optimize=true
webapp=/solr path=/dataimport
params={core=example_collection&optimize=true&indent=on&commit=true&name=dataimport&clean=false&wt=json&command=full-import&_=1657098443936&verbose=true}
status=0 QTime=5
If you send clean=true then DIH should wipe the index data before it
begins importing. If you set optimize=true, then Solr should optimize
the index AFTER the import is done. It is very odd to have it behave
differently when the combination of parameters is used ... maybe when
both parameters are true, DIH is doing a commit BEFORE importing begins,
and without that combo, the commit doesn't happen, and a commit is only
done after the import.
It might be better to set commit and optimize to false, and manually do
those operations yourself after importing completes. Just an FYI ...
optimizing is generally not recommended because of how long it can take
and the fact that it uses a lot of system resources.
Note that in Solr 9.x DIH is no longer present. This is because the
feature has some problems, especially in cloud mode. You seem to have
stumbled onto one of the many bugs DIH has.
You may have greater luck with the separate version of DIH:
https://github.com/rohitbemax/dataimporthandler
You can also do the import with a new collection and then update an
alias to point the "true" collection name to the new one after indexing
is complete. This is a good paradigm to use in general.
Thanks,
Shawn