The current approach for high cardinality aggregations is the MapReduce approach:
parallel(rollup(search())) But what Yonik describes would be much more efficient. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Aug 21, 2017 at 3:44 PM, Mikhail Khludnev <m...@apache.org> wrote: > Thanks for sharing this idea, Younik! > I've raised https://issues.apache.org/jira/browse/SOLR-11271. > > On Mon, Aug 21, 2017 at 4:00 PM, Yonik Seeley <ysee...@gmail.com> wrote: > > > On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev <m...@apache.org> > wrote: > > > Hello! > > > > > > I need to count really wide facet on 30 shards index with roughly 100M > > > docs, the facet response is about 100M values takes 0.5G in text file. > > > > > > So, far I experimented with old facets. It calculates per shard facets > > > fine, but then a node which attempts to merge such 30 responses fails > due > > > to OOM. It's reasonable. > > > > > > I suppose I'll get pretty much same with json.facet, or it's better > > > scalable? > > > > > > I want to experiment with Streaming Expression, which I've never taken > > yet. > > > I've found facet() expression and select() with partitionKeys they'll > try > > > to merge facet values in FacetComponent/Module anyway. > > > Is there a way to merge per-shard facet responses with Streaming? > > > > Yeah, I think I've mentioned before that this is the way it should be > > implemented (per-shard distrib=false facet request merged by streaming > > expression). > > The JSON Facet "stream" method does stream (i.e. does not build up the > > response all in memory first), but only at the shard level and not at > > the distrib/merge level. This could then be fed into streaming to get > > exact facets (and streaming facets). But I don't think this has been > > done yet. > > > > -Yonik > > > > > > -- > Sincerely yours > Mikhail Khludnev >