The current approach for high cardinality aggregations is the MapReduce
approach:

parallel(rollup(search()))

But what Yonik describes would be much more efficient.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Aug 21, 2017 at 3:44 PM, Mikhail Khludnev <m...@apache.org> wrote:

> Thanks for sharing this idea, Younik!
> I've raised https://issues.apache.org/jira/browse/SOLR-11271.
>
> On Mon, Aug 21, 2017 at 4:00 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>
> > On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev <m...@apache.org>
> wrote:
> > > Hello!
> > >
> > > I need to count really wide facet on 30 shards index with roughly 100M
> > > docs, the facet response is about 100M values takes 0.5G in text file.
> > >
> > > So, far I experimented with old facets. It calculates per shard facets
> > > fine, but then a node which attempts to merge such 30 responses fails
> due
> > > to OOM. It's reasonable.
> > >
> > > I suppose I'll get pretty much same with json.facet, or it's better
> > > scalable?
> > >
> > > I want to experiment with Streaming Expression, which I've never taken
> > yet.
> > > I've found facet() expression and select() with partitionKeys they'll
> try
> > > to merge facet values in FacetComponent/Module anyway.
> > > Is there a way to merge per-shard facet responses with Streaming?
> >
> > Yeah, I think I've mentioned before that this is the way it should be
> > implemented (per-shard distrib=false facet request merged by streaming
> > expression).
> > The JSON Facet "stream" method does stream (i.e. does not build up the
> > response all in memory first), but only at the shard level and not at
> > the distrib/merge level.  This could then be fed into streaming to get
> > exact facets (and streaming facets).  But I don't think this has been
> > done yet.
> >
> > -Yonik
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to