: Is it possible to access collection statistics - especially IDF values : for all non-discarded terms in the current document - from within an : implementation of the Signature class?
The Signature API just lets you compute a unique value from a pile of Strings, but you could extend the SignatureUpdateProcessorFactory to only give the Signature class specific field values based on IDF values (which are available to the SignatureUpdateProcessorFactory via the IndexReader via the SolrCore via the SolrQueryRequest) The complication you will run into with an approach like this, is that the UpdateProcessor pipeline happens before Analysis (it has to since it might be adding/removing fields from the documents) so the String values haven't been tokenized yet, so you can't easily "lookup" the IDF of the terms in the doc ... you'd have to do your own preliminary Analysis of the raw field values. -Hoss