Well, I actually index an array in my field. But when I use f1=col(s1, feature), it extracts it as a multi-valued field. I understand that col() is used to extract a field value from multiple retrieved instances, so it kind of puts it into an array, forming a multidimensional array.
Could it be possible that I am not using the most adequate field type to store my features? I just want to store some arrays (for instance one 128-dim feature vector for each document). Also, as it is now, I need to perform an extra request to know the number of results I get from the query. This way I can then create the right streaming expression, with the right number of "fn" variables. On Thu, 29 Apr 2021 at 16:58, Joel Bernstein <[email protected]> wrote: > I agree this is very verbose. I didn't even realize you could index a > multidimensional array into a multi-value field until now. Knowing this it > makes sense to support matrix creation directly from multi-value arrays. > I'll add this when i get some time. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER <[email protected]> > wrote: > > > Hi Joel, > > > > Thank you for pointing me to that part of the documentation. valueAt() is > > exactly what I needed here. > > However, as you point out, there seems to be no way to directly get the > > matrix from a multidimensional array. > > As a consequence, my streaming expression is very verbose and quite long > > for my purpose (I perform this over a thousand documents), but it > actually > > works by doing it that way (and I get rid of an extra queries to get the > > ids from a text search for instance): > > > > let( > > s=search(test,q="*",fl="feature"), > > f1=valueAt(col(s, feature ),0), > > f2=valueAt(col(s, feature ),1), > > f3=valueAt(col(s, feature ),2), > > m=transpose(matrix(f1,f2,f3)), > > d=distance(m,cosine()) > > ) > > > > > > Thank you again, > > Best, > > > > Xavier > > > > On Thu, 29 Apr 2021 at 16:04, Joel Bernstein <[email protected]> wrote: > > > > > That's interesting, it seems like you've indexed a matrix into a field. > > > > > > If that's the case I think you'll need to access the arrays using the > > index > > > as described here: > > > > > > > > > https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index > > > > > > Then you can create a matrix from the arrays. > > > > > > I guess we need to add a way to materialize the matrix directly from a > > > multidimensional array. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER <[email protected] > > > > > wrote: > > > > > > > Hello everyone, > > > > > > > > I am currently trying to create a system for performing distance > > > > computation of different documents based on some pre-computed > numerical > > > > feature vector. > > > > > > > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I > have > > > > documents as such, with the feature field being pfloat with > multiValued > > > set > > > > to True: > > > > > > > > { > > > > "id":"1", > > > > "feature":[ > > > > 0.1, > > > > 0.5, > > > > 0.6, > > > > 1.7], > > > > , > > > > { > > > > "id":"2", > > > > "feature":[ > > > > 0.5, > > > > 0.1, > > > > 0.7, > > > > 0.9], > > > > }, > > > > { > > > > "id":"3", > > > > "feature":[ > > > > -0.5, > > > > 0.9, > > > > 1.5, > > > > 0.2], > > > > }, > > > > > > > > I want to create a matrix so I can then use the distance() function > to > > > > compute the distances for the columns of a matrix. The documentation > > > > provides an example of what I am interested in, by defining the > vectors > > > on > > > > the fly: > > > > > > > > let(a=array(20, 30, 40), > > > > b=array(21, 29, 41), > > > > c=array(31, 40, 50), > > > > d=matrix(a, b, c), > > > > c=distance(d)) > > > > > > > > By transposing the matrix I can easily perform the distance between > the > > > > rows, so I can get what I want. > > > > > > > > However, now I want to extract the numerical features from a feature > > > field > > > > indexed in Solr. The documentation explains how to create a matrix > from > > > > numerical values stored in some fields: > > > > > > > > let( > > > > a=random(collection1, q="market:A", rows="5000", fl="price_f"), > > > > b=random(collection1, q="market:B", rows="5000", fl="price_f"), > > > > c=random(collection1, q="market:C", rows="5000", fl="price_f"), > > > > d=random(collection1, q="market:D", rows="5000", fl="price_f"), > > > > e=col(a, price_f), > > > > f=col(b, price_f), > > > > g=col(c, price_f), > > > > h=col(d, price_f), > > > > i=matrix(e, f, g, h), > > > > j=sumRows(i)) > > > > > > > > However, in my case, I already have an array of float values for each > > > > document. So I try to do it that way: > > > > > > > > let( > > > > s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature), > > > > s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature), > > > > s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature), > > > > m=matrix(f1,f2,f3) > > > > ) > > > > > > > > But I get this error: > > > > > > > > { > > > > "result-set": { > > > > "docs": [ > > > > { > > > > "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) > - > > > > Numeric value expected but found type java.util.ArrayList for value > > > > [0.1,0.5,0.6,1.7]", > > > > "EOF": true, > > > > "RESPONSE_TIME": 5 > > > > } > > > > ] > > > > } > > > > } > > > > > > > > When I inspect what I get as f3, I see that I have an array of array, > > > which > > > > is why I think it is failing here to create the matrix. I've been > > > searching > > > > a lot on how to create a matrix from float vectors stored in a field > of > > > my > > > > documents, and I still cannot find any solution. What I could do is > > > extract > > > > the vectors, create them on the fly, and construct the vectors and > > > matrix, > > > > but I would like to be able to do it in one request. Moreover, I find > > it > > > > really curious that I cannot directly create the matrix on the > results > > > of a > > > > a normal search. For instance, I would prefer to do something like > > that: > > > > > > > > s=search(test,q="*",fl="feature,id"), m=col(s,feature)) > > > > > > > > which returns: > > > > > > > > { > > > > "result-set": { > > > > "docs": [ > > > > { > > > > "m": [ > > > > [ > > > > 0.1, > > > > 0.5, > > > > 0.6, > > > > 1.7 > > > > ], > > > > [ > > > > 0.5, > > > > 0.1, > > > > 0.7, > > > > 0.9 > > > > ], > > > > [ > > > > -0.5, > > > > 0.9, > > > > 1.5, > > > > 0.2] > > > > ] > > > > ] > > > > }, > > > > { > > > > "EOF": true, > > > > "RESPONSE_TIME": 3 > > > > } > > > > ] > > > > } > > > > } > > > > > > > > and be able to use the matrix I obtain here. But again, I was not > able > > to > > > > perform matrix operations on "m". > > > > > > > > Does anyone know any elegant way to create a matrix from my numerical > > > > vectors stored in my feature field? > > > > > > > > > > > > Thank you. > > > > -- > > > > Xavier Favory > > > > Music Technology Group > > > > Universitat Pompeu Fabra > > > > > > > > > > > > > -- > > Xavier Favory > > Music Technology Group > > Universitat Pompeu Fabra > > > -- Xavier Favory Music Technology Group Universitat Pompeu Fabra
