Re: Huge Facets and Streaming

Mikhail Khludnev Mon, 21 Aug 2017 12:44:20 -0700

Thanks for sharing this idea, Younik!
I've raised https://issues.apache.org/jira/browse/SOLR-11271.


On Mon, Aug 21, 2017 at 4:00 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev <m...@apache.org> wrote:
> > Hello!
> >
> > I need to count really wide facet on 30 shards index with roughly 100M
> > docs, the facet response is about 100M values takes 0.5G in text file.
> >
> > So, far I experimented with old facets. It calculates per shard facets
> > fine, but then a node which attempts to merge such 30 responses fails due
> > to OOM. It's reasonable.
> >
> > I suppose I'll get pretty much same with json.facet, or it's better
> > scalable?
> >
> > I want to experiment with Streaming Expression, which I've never taken
> yet.
> > I've found facet() expression and select() with partitionKeys they'll try
> > to merge facet values in FacetComponent/Module anyway.
> > Is there a way to merge per-shard facet responses with Streaming?
>
> Yeah, I think I've mentioned before that this is the way it should be
> implemented (per-shard distrib=false facet request merged by streaming
> expression).
> The JSON Facet "stream" method does stream (i.e. does not build up the
> response all in memory first), but only at the shard level and not at
> the distrib/merge level.  This could then be fed into streaming to get
> exact facets (and streaming facets).  But I don't think this has been
> done yet.
>
> -Yonik
>



-- 
Sincerely yours
Mikhail Khludnev

Re: Huge Facets and Streaming

Reply via email to