in case this helps someone, here is a solution (probably very efficient already, but i didn't profile it); it can deal with DocValues and with FieldCache (the old 'stored' values)
private void unInvertedTheDamnThing( SolrIndexSearcher searcher, List<String> fields, KVSetter setter) throws IOException { LeafReader reader = searcher.getLeafReader(); IndexSchema schema = searcher.getCore().getLatestSchema(); List<LeafReaderContext> leaves = reader.getContext().leaves(); Bits liveDocs; LeafReader lr; Transformer transformer; for (LeafReaderContext leave: leaves) { int docBase = leave.docBase; liveDocs = leave.reader().getLiveDocs(); lr = leave.reader(); FieldInfos fInfo = lr.getFieldInfos(); for (String field: fields) { FieldInfo fi = fInfo.fieldInfo(field); SchemaField fSchema = schema.getField(field); DocValuesType fType = fi.getDocValuesType(); Map<String,Type> mapping = new HashMap<String,Type>(); final LeafReader unReader; if (fType.equals(DocValuesType.NONE)) { Class<? extends DocValuesType> c = fType.getClass(); if (c.isAssignableFrom(TextField.class) || c.isAssignableFrom(StrField.class)) { if (fSchema.multiValued()) { mapping.put(field, Type.SORTED); } else { mapping.put(field, Type.BINARY); } } else if (c.isAssignableFrom(TrieIntField.class)) { if (fSchema.multiValued()) { mapping.put(field, Type.SORTED_SET_INTEGER); } else { mapping.put(field, Type.INTEGER_POINT); } } else { continue; } unReader = new UninvertingReader(lr, mapping); } else { unReader = lr; } switch(fType) { case NUMERIC: transformer = new Transformer() { NumericDocValues dv = unReader.getNumericDocValues(field); @Override public void process(int docBase, int docId) { int v = (int) dv.get(docId); setter.set(docBase, docId, v); } }; break; case SORTED_NUMERIC: transformer = new Transformer() { SortedNumericDocValues dv = unReader.getSortedNumericDocValues(field); @Override public void process(int docBase, int docId) { dv.setDocument(docId); int max = dv.count(); int v; for (int i=0; i<max; i++) { v = (int) dv.valueAt(i); setter.set(docBase, docId, v); } } }; break; case SORTED_SET: transformer = new Transformer() { SortedSetDocValues dv = unReader.getSortedSetDocValues(field); int errs = 0; @Override public void process(int docBase, int docId) { if (errs > 5) return; dv.setDocument(docId); for (long ord = dv.nextOrd(); ord != SortedSetDocValues.NO_MORE_ORDS; ord = dv.nextOrd()) { final BytesRef value = dv.lookupOrd(ord); setter.set(docBase, docId, value.utf8ToString()); } } }; break; case SORTED: transformer = new Transformer() { SortedDocValues dv = unReader.getSortedDocValues(field); TermsEnum te; @Override public void process(int docBase, int docId) { BytesRef v = dv.get(docId); if (v.length == 0) return; setter.set(docBase, docId, v.utf8ToString()); } }; break; default: throw new IllegalArgumentException("The field " + field + " is of type that cannot be un-inverted"); } int i = 0; while(i < lr.maxDoc()) { if (liveDocs != null && !(i < liveDocs.length() && liveDocs.get(i))) { i++; continue; } transformer.process(docBase, i); i++; } } } } On Wed, Aug 17, 2016 at 1:22 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > Joel, thanks, but which of them? I've counted at least 4, if not more, > different ways of how to get DocValues. Are there many functionally > equal approaches just because devs can't agree on using one api? Or is > there a deeper reason? > > Btw, the FieldCache is still there - both in lucene (to be deprecated) > and in solr; but became package accessible only > > This is what removed the FieldCache: > https://issues.apache.org/jira/browse/LUCENE-5666 > This is what followed: https://issues.apache.org/jira/browse/SOLR-8096 > > And there is still code which un-inverts data from an index if no > doc-values are available. > > --roman > > On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernstein <joels...@gmail.com> wrote: >> You'll want to use org.apache.lucene.index.DocValues. The DocValues api has >> replaced the field cache. >> >> >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Tue, Aug 16, 2016 at 8:18 PM, Roman Chyla <roman.ch...@gmail.com> wrote: >> >>> I need to read data from the index in order to build a special cache. >>> Previously, in SOLR4, this was accomplished with FieldCache or >>> DocTermOrds >>> >>> Now, I'm struggling to see what API to use, there is many of them: >>> >>> on lucene level: >>> >>> UninvertingReader.getNumericDocValues (and others) >>> <IndexReader>.getNumericValues() >>> MultiDocValues.getNumericValues() >>> MultiFields.getTerms() >>> >>> on solr level: >>> >>> reader.getNumericValues() >>> UninvertingReader.getNumericDocValues() >>> and extensions to FilterLeafReader - eg. very intersting, but >>> undocumented facet accumulators (ex: NumericAcc) >>> >>> >>> I need this for solr, and ideally re-use the existing cache [ie. the >>> special cache is using another fields so those get loaded only once >>> and reused in the old solr; which is a win-win situation] >>> >>> If I use reader.getValues() or FilterLeafReader will I be reading data >>> every time the object is created? What would be the best way to read >>> data only once? >>> >>> Thanks, >>> >>> --roman >>>