Hi Mikhail, sorry for not being clear, I'll try again. For my understanding the solr scale function, once applied to a field, needs min and max for that field. Those min and max values by default are calculated by all the existing documents, I don't know exactly how this is implemented internally in Solr. I assume that, in the worst case scenario, all the documents have to be traversed reading all the values for the given field and then somehow saving the min/max. In the Solr scale function documentation is also written: > The current implementation cannot distinguish when documents have been deleted or documents that have no value. It uses 0.0 values for these cases. This means that often the min value can be 0 if you have only positive values.
But what happens if I need to scale the values of a field only within the documents that are the result of a query? Only a few hundreds or thousands of documents? First of all min and max has to be calculated only on the result set of your query. That is what I was trying to say when I wrote "apply the scale function only to the result set (and not to the entire collection)". For example, if you apply the scale function to the field price in Solr techproducts example, "min" and "max" are between 0.0 and 2199.0 http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price So even if a filter query is added - fq=popularity:(1 OR 7) - the values are scaled between 0.0 and 2199.0. http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201) { "responseHeader":{ "status":0, "QTime":30, "params":{ "q":"*:*", "fl":"price,scale(price, 0, 1)", "fq":"popularity:(1 OR 7)", "rows":"100"}}, "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ { "price":74.99, "scale(price, 0, 1)":0.034101862}, { "price":19.95, "scale(price, 0, 1)":0.009072306}, { "price":11.5, "scale(price, 0, 1)":0.0052296496}, { "price":329.95, "scale(price, 0, 1)":0.15004548}, { "price":479.95, "scale(price, 0, 1)":0.2182583}, { "price":649.99, "scale(price, 0, 1)":0.29558435}] }} As you can see in the results of this query, prices are between 11.5 and 649.99. What if I want to scale the prices between 11.5 and 649.99? Or, in other words, what is the easiest way to scale all the values of a field with the min and max of the current query results? Right now I'm investigating what's the best way to scale the values of one or more fields within Solr, but only within the documents that are in the current result set. Hope this helps to make things clearer. Best regards, Vincenzo On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev <[email protected]> wrote: > Vincenzo, > Can you elaborate what it means ' apply the scale function only to the > result set (and not to > the entire collection).' ? > > On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore <[email protected]> > wrote: > > > Hi Mikhail, > > > > I'm trying to apply the scale function only to the result set (and not to > > the entire collection). > > And I discovered that adding "query($q)" to the scale function does the > > trick. > > In other words, adding "query($q)" forces solr to restrict the scale > > function only to the result set. > > > > But if I add an fq to the query parameters the scale function applies > only > > to the q param. > > For example: > > > > > > > http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":8, > > "params":{ > > "q":"*:*", > > "fl":"price,scale(sum(price,query($q)), 0, 1)", > > "fq":"popularity:(1 OR 7)", > > "rows":"100"}}, > > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > > { > > "price":74.99, > > "scale(sum(price,query($q)), 0, 1)":0.034101862}, > > { > > "price":19.95, > > "scale(sum(price,query($q)), 0, 1)":0.009072306}, > > { > > "price":11.5, > > "scale(sum(price,query($q)), 0, 1)":0.0052296496}, > > { > > "price":329.95, > > "scale(sum(price,query($q)), 0, 1)":0.15004548}, > > { > > "price":479.95, > > "scale(sum(price,query($q)), 0, 1)":0.2182583}, > > { > > "price":649.99, > > "scale(sum(price,query($q)), 0, 1)":0.29558435}] > > }} > > > > I can avoid this problem by adding a new parameter query($fq) to the > scale > > function, but this solution is cumbersome and not maintainable. > > For example: > > > > > > > http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(sum(price,query($q)),query($fq)),%200,%201),manu_id_s > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":1, > > "params":{ > > "q":"manu_id_s:(corsair belkin canon viewsonic)", > > "fl":"price,scale(sum(sum(price,query($q)),query($fq)), 0, > > 1),manu_id_s", > > "fq":"price:[0 TO 200]", > > "rows":"100"}}, > > "response":{"numFound":5,"start":0,"numFoundExact":true,"docs":[ > > { > > "manu_id_s":"belkin", > > "price":19.95, > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.048746154}, > > { > > "manu_id_s":"belkin", > > "price":11.5, > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.0}, > > { > > "manu_id_s":"canon", > > "price":179.99, > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.97198087}, > > { > > "manu_id_s":"corsair", > > "price":185.0, > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":1.0}, > > { > > "manu_id_s":"corsair", > > "price":74.99, > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.3653772}] > > }} > > > > > > > > > > On Tue, May 31, 2022 at 2:48 PM Mikhail Khludnev <[email protected]> > wrote: > > > > > Hello Vincenzo, > > > > > > I'm not getting your point: > > > > > > > if I add an fq parameter the scale function still continues to work > > only > > > on > > > the q param . > > > > > > well, but the function actually refers to q param: > > > scale(sum(price,query($q)), 0, 1). > > > > > > What's your expectation values of query($q) with "q":"popularity:(1 > OR > > > 7)"? I suggest to check it with fl=score > > > > > > > > > On Tue, May 31, 2022 at 2:05 PM Vincenzo D'Amore <[email protected]> > > > wrote: > > > > > > > Hi all, > > > > > > > > playing with the solr scale function I found a few corner cases > where I > > > > need to scale only the results set. > > > > > > > > I found a workaround that works but it does not seem to be viable, > > > because > > > > if I add an fq parameter the scale function still continues to work > > only > > > on > > > > the q param . > > > > > > > > For example with q=popularity:(1 OR 7): > > > > > > > > http://localhost:8983/solr/techproducts/select?q=popularity:(1 OR > > > > 7)&rows=100&fl=price,scale(sum(price,query($q)), 0, 1) > > > > > > > > { > > > > "responseHeader":{ > > > > "status":0, > > > > "QTime":1, > > > > "params":{ > > > > "q":"popularity:(1 OR 7)", > > > > "fl":"price,scale(sum(price,query($q)), 0, 1)", > > > > "rows":"100"}}, > > > > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > > > > { > > > > "price":74.99, > > > > "scale(sum(price,query($q)), 0, 1)":0.099437736}, > > > > { > > > > "price":19.95, > > > > "scale(sum(price,query($q)), 0, 1)":0.013234352}, > > > > { > > > > "price":11.5, > > > > "scale(sum(price,query($q)), 0, 1)":0.0}, > > > > { > > > > "price":329.95, > > > > "scale(sum(price,query($q)), 0, 1)":0.49875492}, > > > > { > > > > "price":479.95, > > > > "scale(sum(price,query($q)), 0, 1)":0.7336842}, > > > > { > > > > "price":649.99, > > > > "scale(sum(price,query($q)), 0, 1)":1.0}] > > > > }} > > > > > > > > but moving the filter in fq: > > > > > > > > > > > > > > > > > > http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(sum(price,query($q)),%200,%201) > > > > > > > > { > > > > "responseHeader":{ > > > > "status":0, > > > > "QTime":8, > > > > "params":{ > > > > "q":"*:*", > > > > "fl":"price,scale(sum(price,query($q)), 0, 1)", > > > > "fq":"popularity:(1 OR 7)", > > > > "rows":"100"}}, > > > > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > > > > { > > > > "price":74.99, > > > > "scale(sum(price,query($q)), 0, 1)":0.034101862}, > > > > { > > > > "price":19.95, > > > > "scale(sum(price,query($q)), 0, 1)":0.009072306}, > > > > { > > > > "price":11.5, > > > > "scale(sum(price,query($q)), 0, 1)":0.0052296496}, > > > > { > > > > "price":329.95, > > > > "scale(sum(price,query($q)), 0, 1)":0.15004548}, > > > > { > > > > "price":479.95, > > > > "scale(sum(price,query($q)), 0, 1)":0.2182583}, > > > > { > > > > "price":649.99, > > > > "scale(sum(price,query($q)), 0, 1)":0.29558435}] > > > > }} > > > > > > > > > > > > On the other hand, I was thinking of implementing a custom scale > > function > > > > that by default works only on the current result set and not on the > > > entire > > > > collection. > > > > > > > > Any suggestions on how to solve this problem? > > > > > > > > Best regards, > > > > Vincenzo > > > > > > > > > > > > -- > > > > Vincenzo D'Amore > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > > -- > > Vincenzo D'Amore > > > > > -- > Sincerely yours > Mikhail Khludnev > -- Vincenzo D'Amore
