Just got to give this a try and it worked GREAT! Here is the working example (that will be in the upcoming “How to use Vectors” tutorial):
let( a=select( search(films, qt="/select", q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and the Chamber of Secrets"", fl="id,name,film_vector"), film_vector), b=col(a, film_vector), m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)), average=scalarDivide(3, sumColumns(m)) ) > On Oct 15, 2023, at 11:53 PM, Joel Bernstein <joels...@gmail.com> wrote: > > This would in theory return the average of the vectors: > > let(a=select(search(...), film_vector), > b=col(a, film_vector), > m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)), > av=scalarDivide(3, sumColumns(m)) > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Sat, Oct 14, 2023 at 2:50 PM ufuk yılmaz <uyil...@vivaldi.net.invalid> > wrote: > >> The main thing which converts search result fields to arrays is the “col” >> function >> https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function >> >> You may also need “let” to use variables etc. Rest is just employing >> available math functions. >> >> But they don’t play well with multivalued fields, it’s hard to work with >> them. They look like arrays but are not exactly arrays. It’s just a bunch >> of values sticking together. For example afaik there’s no way to refer to >> 1st, 2nd element of a multivalued field. When you enable docValues and use >> the export handler, those values would be returned in ascending order, >> losing position information. >> >> For example if the ratings were from different movie raters, such as imdb, >> rottentomatoes etc and every rating were in a different field, it would be >> much easier to work with, as Solr expects to build arrays and matrices from >> such formatted documents. >> >> I’d be happy to learn if someone more knowledgeable has a better answer. >> >> Sent from Mail for Windows >> >> From: Eric Pugh >> Sent: Saturday, October 14, 2023 8:05 PM >> To: users@solr.apache.org >> Subject: Re: Vector math with Streaming Expressions? >> >> By average them, I mean the first version. So at the end, I get a set of >> numbers that represents the average vector. >> >> Here is an example of the vector.. >> https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365 >> >> In the existing docs on searching vectors, we make a statement that we >> have the average vector of three movies: >> https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154 >> >> I’d actually like to figure out how to calculate that vector from data we >> have in Solr already. >> >> >> >>> On Oct 14, 2023, at 12:50 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID> >> wrote: >>> >>> By “average them” do you mean to calculate the simple arithmetic average >> element by element of the all returned film ratings? Eg. sum first element >> of all arrays and divide by the number of arrays, do it again for the >> second element etc.. >>> >>> Or find the average of the array for each movie, producing a single >> number for each movie >>> >>> ~ufuk >>> >>> — >>> >>>> On 14 Oct 2023, at 19:19, Eric Pugh <ep...@opensourceconnections.com >> <mailto:ep...@opensourceconnections.com>> wrote: >>>> >>>> I’m trying to average three arrays of floats and not quite making the >> conceptual jump from “I defined a array of numbers” in the way that the >> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math >> example expects with “I made a query and get back a array of numbers”. >>>> >>>> I’m using the films example, so : bin/solr start -c -e films >>>> >>>> Then, I want to get the vectors for three films and average them. >>>> >>>> The streaming expression grabs the three vectors, but I can’t figure >> out how to wrap it in something to average them. >>>> >>>> select( >>>> search(films, >>>> qt="/select", >>>> q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter >> and the Chamber of Secrets"", >>>> fl="id,name,film_vector"), >>>> film_vector >>>> ) >>>> >>>> produces: >>>> >>>> { >>>> "result-set": { >>>> "docs": [ >>>> { >>>> "film_vector": [ >>>> "-0.2758314", >>>> "-0.14416906", >>>> "-0.11316811", >>>> "0.2745105", >>>> "0.040616427", >>>> "-4.2628963E-4", >>>> "-0.120363355", >>>> "0.07888852", >>>> "0.036417373", >>>> "-0.29541242" >>>> ] >>>> }, >>>> { >>>> "film_vector": [ >>>> "-0.11665395", >>>> "0.04247921", >>>> "-0.13233364", >>>> "0.52578413", >>>> "-0.1739291", >>>> "-0.01880563", >>>> "-0.06670809", >>>> "-0.11242808", >>>> "0.09724514", >>>> "-0.11909142" >>>> ] >>>> }, >>>> { >>>> "film_vector": [ >>>> "-0.14272659", >>>> "0.13051921", >>>> "-0.19087574", >>>> "0.44983688", >>>> "-0.21098459", >>>> "0.0033124345", >>>> "-0.008155139", >>>> "-0.09109363", >>>> "0.12401622", >>>> "-0.12211737" >>>> ] >>>> }, >>>> { >>>> "EOF": true, >>>> "RESPONSE_TIME": 24 >>>> } >>>> ] >>>> } >>>> } >>>> >>>> Great, now how do I average across them and get the final vector that I >> expect, which should be similar to: >>>> >>>> [-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, >> 0.0859, -0.1789] >>>> >>>> Thanks! >>>> >>>> Eric >>>> >>>> _______________________ >>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 >> | http://www.opensourceconnections.com < >> http://www.opensourceconnections.com/>< >> http://www.opensourceconnections.com/> | My Free/Busy < >> http://tinyurl.com/eric-cal> >>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> >> >>>> This e-mail and all contents, including attachments, is considered to >> be Company Confidential unless explicitly stated otherwise, regardless of >> whether attachments are marked as such. >> >> _______________________ >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >> http://www.opensourceconnections.com < >> http://www.opensourceconnections.com/> | My Free/Busy < >> http://tinyurl.com/eric-cal> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> >> >> This e-mail and all contents, including attachments, is considered to be >> Company Confidential unless explicitly stated otherwise, regardless of >> whether attachments are marked as such. >> >> >> _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.