You can find id terms repeating in an index via https://solr.apache.org/guide/solr/latest/query-guide/terms-component.html and terms.mincount=2 or do the same via facets q=*:*&facet=true&facet.field=id&facet.limit=-1&facet.mincount=2 (just on top of my head) Then you can query duplicated ids one by one. If you don't have strictly unique field assigned, it's not possible to drop duplicates. You can get internal unique identifier a kind of analogy to ROW_NUMBER via [docid] see https://solr.apache.org/guide/solr/latest/query-guide/document-transformers.html#docid-docidaugmenterfactory . But I'm not aware about a query accepting this number.
On Sun, Oct 22, 2023 at 3:22 PM Vince McMahon <sippingonesandze...@gmail.com> wrote: > I have a SOLR 8.X. I suspect one of the core has duplicates and wants to > remove the duplicated documents. Signature, as in the SOLR guide, is not > implemented. https://solr.apache.org/guide/6_6/de-duplication.html > > in sql, a query without the use of a hash column will be liked: > ;WITH CTE AS > ( > SELECT cols, > RN = ROW_NUMBER() OVER( PARTITION BY cols > ORDER BY updated DESC) > FROM [table] > ) > DELETE FROM CTE > WHERE RN > 1 > > what would be the syntax for SOLR query? > -- Sincerely yours Mikhail Khludnev