Re: Stream expressions: Break up multivalue field into usable tuples

Joel Bernstein Thu, 22 Sep 2016 15:06:37 -0700

You could use the facet() expression which works with multi-value fields.
This emits aggregated tuples useful for recommendations. For example:


facet(baskets,
         q="item:taco",
         buckets="item",
         bucketSorts="count(*) desc",
         bucketSizeLimit="100",
         count(*))

You can feed this to scoreNodes() to score the tuples for a recommendation.
scoreNodes is a graph expression so it expects tuples to be formatted like
a node set. Specifically it looks for the following fields: node, field and
collection, which it uses to retrieve the IDF for each node.

The select() function can turn your facet response into a node set, so
scoreNodes can operate on it:

scoreNodes(
                    select(facet(baskets,
                     q="item:taco",
                     buckets="item",
                     bucketSorts="count(*) desc",
                     bucketSizeLimit=100,
                     count(*)),
               item as node,
               count(*),
               replace(collection, null, withValue=baskets),
               replace(field, null, withValue=item)))

There is a ticket open to have scoreNodes operate directly on the facet()
function so you don't have to deal with
the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd
like to get to this soon.







Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> I have a field like follows in my search index
>
> {
>    "shopper_id": 1234,
>    "basket_id": 2512,
>    "items_bought": ["eggs", "tacos", "nachos"]
> }
>
> {
>    "shopper_id" 1236,
>    "basket_id": 2515,
>    "items_bought": ["eggs", "tacos", "chicken", "bubble gum"]
> }
>
> I would like to use some of the stream expression capabilities (in this
> case I'm looking at the recsys stuff) but it seems like I need to break up
> my data into tuples like
>
> {
>    "shopper_id": 1234,
>    "basket_id": 2512,
>     "item": "egg"
> },
> {
>    "shopper_id": 1234
>    "basket_id": 2512,
>    "item": "taco"
> }
> {
>    "shopper_id": 1234
>    "basket_id": 2512,
>    "item": "nacho"
> }
> ...
>
> For various other reasons, I'd prefer to keep my original data model with
> Solr doc == one shopper basket.
>
> Now is there a way to take documents above, output from a search tuple
> source and apply a stream mutator to emit baskets with a field broken up
> like above? (do let me know if I'm missing something completely here)
>
> Thanks!
> -Doug
>

Re: Stream expressions: Break up multivalue field into usable tuples

Reply via email to