Just got to give this a try and it worked GREAT!    Here is the working example 
(that will be in the upcoming “How to use Vectors” tutorial):

let(
  a=select(
        search(films,
        qt="/select",
        q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and 
the Chamber of Secrets"",
        fl="id,name,film_vector"),
        film_vector),
  b=col(a, film_vector),
  m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)),
  average=scalarDivide(3, sumColumns(m))
  )


> On Oct 15, 2023, at 11:53 PM, Joel Bernstein <joels...@gmail.com> wrote:
> 
> This would in theory return the average of the vectors:
> 
> let(a=select(search(...), film_vector),
>     b=col(a, film_vector),
>     m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)),
>     av=scalarDivide(3, sumColumns(m))
> 
> 
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> 
> On Sat, Oct 14, 2023 at 2:50 PM ufuk yılmaz <uyil...@vivaldi.net.invalid>
> wrote:
> 
>> The main thing which converts search result fields to arrays is the “col”
>> function
>> https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function
>> 
>> You may also need “let” to use variables etc. Rest is  just employing
>> available math functions.
>> 
>> But they don’t play well with multivalued fields, it’s hard to work with
>> them. They look like arrays but are not exactly arrays. It’s just a bunch
>> of values sticking together. For example afaik there’s no way to refer to
>> 1st, 2nd element of a multivalued field. When you enable docValues and use
>> the export handler, those values would be returned in ascending order,
>> losing position information.
>> 
>> For example if the ratings were from different movie raters, such as imdb,
>> rottentomatoes etc and every rating were in a different field, it would be
>> much easier to work with, as Solr expects to build arrays and matrices from
>> such formatted documents.
>> 
>> I’d be happy to learn if someone more knowledgeable has a better answer.
>> 
>> Sent from Mail for Windows
>> 
>> From: Eric Pugh
>> Sent: Saturday, October 14, 2023 8:05 PM
>> To: users@solr.apache.org
>> Subject: Re: Vector math with Streaming Expressions?
>> 
>> By average them, I mean the first version.   So at the end, I get a set of
>> numbers that represents the average vector.
>> 
>> Here is an example of the vector..
>> https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365
>> 
>> In the existing docs on searching vectors, we make a statement that we
>> have the average vector of three movies:
>> https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154
>> 
>> I’d actually like to figure out how to calculate that vector from data we
>> have in Solr already.
>> 
>> 
>> 
>>> On Oct 14, 2023, at 12:50 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID>
>> wrote:
>>> 
>>> By “average them” do you mean to calculate the simple arithmetic average
>> element by element of the all returned film ratings? Eg. sum first element
>> of all arrays and divide by the number of arrays, do it again for the
>> second element etc..
>>> 
>>> Or find the average of the array for each movie, producing a single
>> number for each movie
>>> 
>>> ~ufuk
>>> 
>>> —
>>> 
>>>> On 14 Oct 2023, at 19:19, Eric Pugh <ep...@opensourceconnections.com
>> <mailto:ep...@opensourceconnections.com>> wrote:
>>>> 
>>>> I’m trying to average three arrays of floats and not quite making the
>> conceptual jump from “I defined a array of numbers” in the way that the
>> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math
>> example expects with “I made a query and get back a array of numbers”.
>>>> 
>>>> I’m using the films example, so :  bin/solr start -c -e films
>>>> 
>>>> Then, I want to get the vectors for three films and average them.
>>>> 
>>>> The streaming expression grabs the three vectors, but I can’t figure
>> out how to wrap it in something to average them.
>>>> 
>>>> select(
>>>> search(films,
>>>>      qt="/select",
>>>>      q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter
>> and the Chamber of Secrets"",
>>>>      fl="id,name,film_vector"),
>>>> film_vector
>>>> )
>>>> 
>>>> produces:
>>>> 
>>>> {
>>>> "result-set": {
>>>>  "docs": [
>>>>    {
>>>>      "film_vector": [
>>>>        "-0.2758314",
>>>>        "-0.14416906",
>>>>        "-0.11316811",
>>>>        "0.2745105",
>>>>        "0.040616427",
>>>>        "-4.2628963E-4",
>>>>        "-0.120363355",
>>>>        "0.07888852",
>>>>        "0.036417373",
>>>>        "-0.29541242"
>>>>      ]
>>>>    },
>>>>    {
>>>>      "film_vector": [
>>>>        "-0.11665395",
>>>>        "0.04247921",
>>>>        "-0.13233364",
>>>>        "0.52578413",
>>>>        "-0.1739291",
>>>>        "-0.01880563",
>>>>        "-0.06670809",
>>>>        "-0.11242808",
>>>>        "0.09724514",
>>>>        "-0.11909142"
>>>>      ]
>>>>    },
>>>>    {
>>>>      "film_vector": [
>>>>        "-0.14272659",
>>>>        "0.13051921",
>>>>        "-0.19087574",
>>>>        "0.44983688",
>>>>        "-0.21098459",
>>>>        "0.0033124345",
>>>>        "-0.008155139",
>>>>        "-0.09109363",
>>>>        "0.12401622",
>>>>        "-0.12211737"
>>>>      ]
>>>>    },
>>>>    {
>>>>      "EOF": true,
>>>>      "RESPONSE_TIME": 24
>>>>    }
>>>>  ]
>>>> }
>>>> }
>>>> 
>>>> Great, now how do I average across them and get the final vector that I
>> expect, which should be similar to:
>>>> 
>>>> [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415,
>> 0.0859, -0.1789]
>>>> 
>>>> Thanks!
>>>> 
>>>> Eric
>>>> 
>>>> _______________________
>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
>> | http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/><
>> http://www.opensourceconnections.com/> | My Free/Busy <
>> http://tinyurl.com/eric-cal>
>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>> 
>>>> This e-mail and all contents, including attachments, is considered to
>> be Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>> 
>> _______________________
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/> | My Free/Busy <
>> http://tinyurl.com/eric-cal>
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>> 
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>> 
>> 
>> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to