Re: Multiple rollups/facets in one streaming aggregation?

Peter Shmukler Sun, 30 Jul 2017 07:42:51 -0700

I need to improve user experience on facets calculation.
Let’s assume we’ve got a time partitioned collections.
Partition1, Partition2, Partition3 …..
AliasAllPartitions unify all partitions together.
Running facets on AliasAllPartitions is very heavy synchronous operation,
user have to wait a lot of time for first result.


My suggestion is to run Partition after partition and return partial results
on some points.
It can be relevant for any aggregate, faceting and count distinct functions.
Actually I need some estimation of facets so I can use “Count Min Sketch”
and HLL in order to keep memory consumption reasonable.
Interface can be like below:
CMSFacet(
list(
search(partition1,q=*:*,fl="author,name,price",qt="/export",sort="name
asc"),
search(partition2,q=*:*,fl="author,name,price",qt="/export",sort="name
asc"),
search(partition3,q=*:*,fl="author,name,price",qt="/export",sort="name
asc"),
search(partition4,q=*:*,fl="author,name,price",qt="/export",sort="name
asc"),
search(partition5,q=*:*,fl="author,name,price",qt="/export",sort="name asc")
),
bucketSizeLimit=150, sizeLimit=400,sum(price),min(price), CMScount(name)
)

Expected output:
{
  "result-set": {
    "docs": [
      {
        "min(price)": "215464",
        "sum(price)": "23545846",
        "CMScount(name)": {“rows”:149,”facet”:[{“A Clash of Kings28”:4},{“A
Clash of Kings16”:4},{“A Clash of Kings27”:4},{“A Clash of Kings15”:4},{“A
Clash of Kings26”:4},{“A Clash of Kings14”:4},{“A Clash of Kings25”:4},{“A
Clash of Kings19”:4},{“A Clash of Kings18”:4},{“A Clash of Kings29”:4},{“A
Game of Thrones18”:6},{“A Clash of Kings20”:4},{“A Clash of Kings13”:4},{“A
Clash of Kings24”:4},{“A Clash of Kings12”:4},{“A Clash of Kings23”:4},{“A
Clash of Kings22”:4},{“A Clash of Kings10”:4},{“A Clash of Kings21”:4},{“A
Clash of Kings5”:4},]}
      },
      {
        "min(price)": "655464",
        "sum(price)": "3584684646846",
        "CMScount(name)": {“rows”:299,”facet”:[{“A Storm of Swords18”:8},{“A
Game of Thrones18”:8},{“A Game of Thrones28”:7},{“A Game of
Thrones27”:7},{“A Game of Thrones24”:5},{“A Game of Thrones3”:11},{“A Game
of Thrones4”:10},{“A Game of Thrones6”:8},{“A Storm of Swords20”:7},{“A Game
of Thrones8”:6},{“A Game of Thrones9”:7},{“A Storm of Swords11”:8},{“A Storm
of Swords22”:8},{“A Storm of Swords21”:10},{“A Storm of Swords13”:8},{“A
Storm of Swords24”:8},{“A Storm of Swords23”:13},{“A Storm of
Swords15”:7},{“A Storm of Swords26”:8},{“A Storm of Swords27”:7},]}
      },
      {
        "min(price)": -214.87158,
        "sum(price)": -40523.873622472,
        "CMScount(name)": {“rows”:399,”facet”:[{“A Storm of
Swords18”:12},{“A Game of Thrones18”:12},{“A Game of Thrones28”:11},{“A Game
of Thrones27”:11},{“A Game of Thrones24”:15},{“A Game of Thrones3”:11},{“A
Game of Thrones4”:10},{“A Game of Thrones6”:12},{“A Storm of
Swords20”:7},{“A Game of Thrones8”:6},{“A Game of Thrones9”:7},{“A Storm of
Swords11”:12},{“A Storm of Swords22”:8},{“A Storm of Swords21”:10},{“A Storm
of Swords13”:12},{“A Storm of Swords24”:12},{“A Storm of Swords23”:13},{“A
Storm of Swords15”:7},{“A Storm of Swords26”:12},{“A Storm of
Swords27”:11},]}
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 4381
      }
    ]
  }
}
I wrote some prototype for this functionality on base of Solr 7. 
I implemented class CMSFacetStream extends TupleStream implements
Expressible and class CMSMetric extends Metric.

My current issues:
-       I return results tuples as soon as I achieve bucketSizeLimit, but I 
don’t
see response of partial result. 
-       How can I return Json object from Metric class?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-rollups-facets-in-one-streaming-aggregation-tp4291952p4348260.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple rollups/facets in one streaming aggregation?

Reply via email to