Hi Joel,

Thank you for pointing me to that part of the documentation. valueAt() is
exactly what I needed here.
However, as you point out, there seems to be no way to directly get the
matrix from a multidimensional array.
As a consequence, my streaming expression is very verbose and quite long
for my purpose (I perform this over a thousand documents), but it actually
works by doing it that way (and I get rid of an extra queries to get the
ids from a text search for instance):

let(
    s=search(test,q="*",fl="feature"),
    f1=valueAt(col(s, feature ),0),
    f2=valueAt(col(s, feature ),1),
    f3=valueAt(col(s, feature ),2),
    m=transpose(matrix(f1,f2,f3)),
    d=distance(m,cosine())
)


Thank you again,
Best,

Xavier

On Thu, 29 Apr 2021 at 16:04, Joel Bernstein <[email protected]> wrote:

> That's interesting, it seems like you've indexed a matrix into a field.
>
> If that's the case I think you'll need to access the arrays using the index
> as described here:
>
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
>
> Then you can create a matrix from the arrays.
>
> I guess we need to add a way to materialize the matrix directly from a
> multidimensional array.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER <[email protected]>
> wrote:
>
> > Hello everyone,
> >
> > I am currently trying to create a system for performing distance
> > computation of different documents based on some pre-computed numerical
> > feature vector.
> >
> > I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
> > documents as such, with the feature field being pfloat with multiValued
> set
> > to True:
> >
> >       {
> >         "id":"1",
> >         "feature":[
> >           0.1,
> >           0.5,
> >           0.6,
> >           1.7],
> >       ,
> >       {
> >         "id":"2",
> >         "feature":[
> >           0.5,
> >           0.1,
> >           0.7,
> >           0.9],
> >       },
> >       {
> >         "id":"3",
> >         "feature":[
> >          -0.5,
> >           0.9,
> >           1.5,
> >           0.2],
> >       },
> >
> > I want to create a matrix so I can then use the distance() function to
> > compute the distances for the columns of a matrix. The documentation
> > provides an example of what I am interested in, by defining the vectors
> on
> > the fly:
> >
> > let(a=array(20, 30, 40),
> >     b=array(21, 29, 41),
> >     c=array(31, 40, 50),
> >     d=matrix(a, b, c),
> >     c=distance(d))
> >
> > By transposing the matrix I can easily perform the distance between the
> > rows, so I can get what I want.
> >
> > However, now I want to extract the numerical features from a feature
> field
> > indexed in Solr. The documentation explains how to create a matrix from
> > numerical values stored in some fields:
> >
> > let(
> >     a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> >     b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> >     c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> >     d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> >     e=col(a, price_f),
> >     f=col(b, price_f),
> >     g=col(c, price_f),
> >     h=col(d, price_f),
> >     i=matrix(e, f, g, h),
> >     j=sumRows(i))
> >
> > However, in my case, I already have an array of float values for each
> > document. So I try to do it that way:
> >
> > let(
> >     s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> >     s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> >     s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> >     m=matrix(f1,f2,f3)
> > )
> >
> > But I get this error:
> >
> > {
> >   "result-set": {
> >     "docs": [
> >       {
> >         "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
> > Numeric value expected but found type java.util.ArrayList for value
> > [0.1,0.5,0.6,1.7]",
> >         "EOF": true,
> >         "RESPONSE_TIME": 5
> >       }
> >     ]
> >   }
> > }
> >
> > When I inspect what I get as f3, I see that I have an array of array,
> which
> > is why I think it is failing here to create the matrix. I've been
> searching
> > a lot on how to create a matrix from float vectors stored in a field of
> my
> > documents, and I still cannot find any solution. What I could do is
> extract
> > the vectors, create them on the fly, and construct the vectors and
> matrix,
> > but I would like to be able to do it in one request. Moreover, I find it
> > really curious that I cannot directly create the matrix on the results
> of a
> > a normal search. For instance, I would prefer to do something like that:
> >
> > s=search(test,q="*",fl="feature,id"), m=col(s,feature))
> >
> > which returns:
> >
> > {
> >   "result-set": {
> >     "docs": [
> >       {
> >         "m": [
> >           [
> >             0.1,
> >             0.5,
> >             0.6,
> >             1.7
> >           ],
> >           [
> >             0.5,
> >             0.1,
> >             0.7,
> >             0.9
> >           ],
> >           [
> >             -0.5,
> >             0.9,
> >             1.5,
> >             0.2]
> >           ]
> >         ]
> >       },
> >       {
> >         "EOF": true,
> >         "RESPONSE_TIME": 3
> >       }
> >     ]
> >   }
> > }
> >
> > and be able to use the matrix I obtain here. But again, I was not able to
> > perform matrix operations on "m".
> >
> > Does anyone know any elegant way to create a matrix from my numerical
> > vectors stored in my feature field?
> >
> >
> > Thank you.
> > --
> > Xavier Favory
> > Music Technology Group
> > Universitat Pompeu Fabra
> >
>


-- 
Xavier Favory
Music Technology Group
Universitat Pompeu Fabra

Reply via email to