Well, I actually index an array in my field.
But when I use f1=col(s1, feature), it extracts it as a multi-valued field.
I understand that col() is used to extract a field value from multiple
retrieved instances, so it kind of puts it into an array, forming a
multidimensional array.

Could it be possible that I am not using the most adequate field type to
store my features? I just want to store some arrays (for instance one
128-dim feature vector for each document).
Also, as it is now, I need to perform an extra request to know the number
of results I get from the query. This way I can then create the right
streaming expression, with the right number of "fn" variables.




On Thu, 29 Apr 2021 at 16:58, Joel Bernstein <[email protected]> wrote:

> I agree this is very verbose. I didn't even realize you could index a
> multidimensional array into a multi-value field until now. Knowing this it
> makes sense to support matrix creation directly from multi-value arrays.
> I'll add this when i get some time.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER <[email protected]>
> wrote:
>
> > Hi Joel,
> >
> > Thank you for pointing me to that part of the documentation. valueAt() is
> > exactly what I needed here.
> > However, as you point out, there seems to be no way to directly get the
> > matrix from a multidimensional array.
> > As a consequence, my streaming expression is very verbose and quite long
> > for my purpose (I perform this over a thousand documents), but it
> actually
> > works by doing it that way (and I get rid of an extra queries to get the
> > ids from a text search for instance):
> >
> > let(
> >     s=search(test,q="*",fl="feature"),
> >     f1=valueAt(col(s, feature ),0),
> >     f2=valueAt(col(s, feature ),1),
> >     f3=valueAt(col(s, feature ),2),
> >     m=transpose(matrix(f1,f2,f3)),
> >     d=distance(m,cosine())
> > )
> >
> >
> > Thank you again,
> > Best,
> >
> > Xavier
> >
> > On Thu, 29 Apr 2021 at 16:04, Joel Bernstein <[email protected]> wrote:
> >
> > > That's interesting, it seems like you've indexed a matrix into a field.
> > >
> > > If that's the case I think you'll need to access the arrays using the
> > index
> > > as described here:
> > >
> > >
> >
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
> > >
> > > Then you can create a matrix from the arrays.
> > >
> > > I guess we need to add a way to materialize the matrix directly from a
> > > multidimensional array.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER <[email protected]
> >
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > I am currently trying to create a system for performing distance
> > > > computation of different documents based on some pre-computed
> numerical
> > > > feature vector.
> > > >
> > > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I
> have
> > > > documents as such, with the feature field being pfloat with
> multiValued
> > > set
> > > > to True:
> > > >
> > > >       {
> > > >         "id":"1",
> > > >         "feature":[
> > > >           0.1,
> > > >           0.5,
> > > >           0.6,
> > > >           1.7],
> > > >       ,
> > > >       {
> > > >         "id":"2",
> > > >         "feature":[
> > > >           0.5,
> > > >           0.1,
> > > >           0.7,
> > > >           0.9],
> > > >       },
> > > >       {
> > > >         "id":"3",
> > > >         "feature":[
> > > >          -0.5,
> > > >           0.9,
> > > >           1.5,
> > > >           0.2],
> > > >       },
> > > >
> > > > I want to create a matrix so I can then use the distance() function
> to
> > > > compute the distances for the columns of a matrix. The documentation
> > > > provides an example of what I am interested in, by defining the
> vectors
> > > on
> > > > the fly:
> > > >
> > > > let(a=array(20, 30, 40),
> > > >     b=array(21, 29, 41),
> > > >     c=array(31, 40, 50),
> > > >     d=matrix(a, b, c),
> > > >     c=distance(d))
> > > >
> > > > By transposing the matrix I can easily perform the distance between
> the
> > > > rows, so I can get what I want.
> > > >
> > > > However, now I want to extract the numerical features from a feature
> > > field
> > > > indexed in Solr. The documentation explains how to create a matrix
> from
> > > > numerical values stored in some fields:
> > > >
> > > > let(
> > > >     a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> > > >     b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> > > >     c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> > > >     d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> > > >     e=col(a, price_f),
> > > >     f=col(b, price_f),
> > > >     g=col(c, price_f),
> > > >     h=col(d, price_f),
> > > >     i=matrix(e, f, g, h),
> > > >     j=sumRows(i))
> > > >
> > > > However, in my case, I already have an array of float values for each
> > > > document. So I try to do it that way:
> > > >
> > > > let(
> > > >     s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> > > >     s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> > > >     s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> > > >     m=matrix(f1,f2,f3)
> > > > )
> > > >
> > > > But I get this error:
> > > >
> > > > {
> > > >   "result-set": {
> > > >     "docs": [
> > > >       {
> > > >         "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3)
> -
> > > > Numeric value expected but found type java.util.ArrayList for value
> > > > [0.1,0.5,0.6,1.7]",
> > > >         "EOF": true,
> > > >         "RESPONSE_TIME": 5
> > > >       }
> > > >     ]
> > > >   }
> > > > }
> > > >
> > > > When I inspect what I get as f3, I see that I have an array of array,
> > > which
> > > > is why I think it is failing here to create the matrix. I've been
> > > searching
> > > > a lot on how to create a matrix from float vectors stored in a field
> of
> > > my
> > > > documents, and I still cannot find any solution. What I could do is
> > > extract
> > > > the vectors, create them on the fly, and construct the vectors and
> > > matrix,
> > > > but I would like to be able to do it in one request. Moreover, I find
> > it
> > > > really curious that I cannot directly create the matrix on the
> results
> > > of a
> > > > a normal search. For instance, I would prefer to do something like
> > that:
> > > >
> > > > s=search(test,q="*",fl="feature,id"), m=col(s,feature))
> > > >
> > > > which returns:
> > > >
> > > > {
> > > >   "result-set": {
> > > >     "docs": [
> > > >       {
> > > >         "m": [
> > > >           [
> > > >             0.1,
> > > >             0.5,
> > > >             0.6,
> > > >             1.7
> > > >           ],
> > > >           [
> > > >             0.5,
> > > >             0.1,
> > > >             0.7,
> > > >             0.9
> > > >           ],
> > > >           [
> > > >             -0.5,
> > > >             0.9,
> > > >             1.5,
> > > >             0.2]
> > > >           ]
> > > >         ]
> > > >       },
> > > >       {
> > > >         "EOF": true,
> > > >         "RESPONSE_TIME": 3
> > > >       }
> > > >     ]
> > > >   }
> > > > }
> > > >
> > > > and be able to use the matrix I obtain here. But again, I was not
> able
> > to
> > > > perform matrix operations on "m".
> > > >
> > > > Does anyone know any elegant way to create a matrix from my numerical
> > > > vectors stored in my feature field?
> > > >
> > > >
> > > > Thank you.
> > > > --
> > > > Xavier Favory
> > > > Music Technology Group
> > > > Universitat Pompeu Fabra
> > > >
> > >
> >
> >
> > --
> > Xavier Favory
> > Music Technology Group
> > Universitat Pompeu Fabra
> >
>


-- 
Xavier Favory
Music Technology Group
Universitat Pompeu Fabra

Reply via email to