Thanks for sharing this idea, Younik! I've raised https://issues.apache.org/jira/browse/SOLR-11271.
On Mon, Aug 21, 2017 at 4:00 PM, Yonik Seeley <ysee...@gmail.com> wrote: > On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev <m...@apache.org> wrote: > > Hello! > > > > I need to count really wide facet on 30 shards index with roughly 100M > > docs, the facet response is about 100M values takes 0.5G in text file. > > > > So, far I experimented with old facets. It calculates per shard facets > > fine, but then a node which attempts to merge such 30 responses fails due > > to OOM. It's reasonable. > > > > I suppose I'll get pretty much same with json.facet, or it's better > > scalable? > > > > I want to experiment with Streaming Expression, which I've never taken > yet. > > I've found facet() expression and select() with partitionKeys they'll try > > to merge facet values in FacetComponent/Module anyway. > > Is there a way to merge per-shard facet responses with Streaming? > > Yeah, I think I've mentioned before that this is the way it should be > implemented (per-shard distrib=false facet request merged by streaming > expression). > The JSON Facet "stream" method does stream (i.e. does not build up the > response all in memory first), but only at the shard level and not at > the distrib/merge level. This could then be fed into streaming to get > exact facets (and streaming facets). But I don't think this has been > done yet. > > -Yonik > -- Sincerely yours Mikhail Khludnev