Re: Nested JSON Facets (Subfacets)
Hi Yonik, thank you for your quick reply. (((I just send my original e-mail a second time (I did not confirm the subscription so I thought it might not have been send the first time, I’m sorry. We are using SOLR 6.1.0. Sorry, I should have mentioned. The low number is because of the test data. It’s not how it would look like in production. That’s also why I was never wondering about 0 values in the beginning. But now that I have tweaked the data I can see that it’s not returning the values as it should. And in production there are values > 0 as expected but the sum() returns 0 nevertheless, that’s why we are aware that something is wrong. In production the data is re-indexed constantly. Though, we might have changed the field type from int to float. I’m not sure whether we have really re-indexed from scratch after that, in production, but I think in my local env I did re-create the index. I will check this out. I’ll also play around with the range query, thanks for the tip! Cheers, Chantal > That should work... what version of Solr are you using? Did you > change the type of the popularity field w/o completely reindexing? > > You can try to verify the number of documents in each bucket that have > the popularity field by adding another sub-facet next to cat_pop: > num_pop:{query:"popularity:[* TO *]"} > > > A quick check with this json.facet parameter: > > > > json.facet: {cat_pop:"sum(popularity)“} > > > > returns: > > > > "facets“: { > > "count":2508, > > "cat_pop":21.0}, > > That looks like a pretty low sum for all those documents perhaps > most of them are missing "popularity" (or have a 0 popularity). > To test one of the buckets at the top-level this way, you could add > fq=shop_cat:"Men > Clothing > Jumpers & Cardigans" > and see if you get anything. > > -Yonik
Nested JSON Facets (Subfacets)
Hi all, this is about using a function in nested facets, specifically the „sum()“ function inside a „terms“ facet using the json.facet api. My json.facet parameter looks like this: json.facet={shop_cat: {type:terms, field:shop_cat, facet: {cat_pop:"sum(popularity)"}}} A snippet of the result: "facets“: { "count":2508, "shop_cat“: { "buckets“: [{ "val“: "Men > Clothing > Jumpers & Cardigans", "count":252, "cat_pop“:0.0 }, { "val":"Men > Clothing > Jackets & Coats", "count":157, "cat_pop“:0.0 }, // and more This looks fine all over but it turns out that „cat_pop“, the result of „sum(popularity)“ is always 0.0 even if the documents for this facet value have popularities > 0. A quick check with this json.facet parameter: json.facet: {cat_pop:"sum(popularity)“} returns: "facets“: { "count":2508, "cat_pop":21.0}, To me, it seems it works fine on the base level but not when nested. Still, Yonik’s documentation and the Jira issues indicate that it is possible to use functions in nested facets so I might just be using the wrong structure? I have a hard time finding any other examples on the i-net and I had no luck changing the structure around. Could someone shed some light on this for me? It would also help to know if it is not possible to sum the values up this way. Thanks a lot! Chantal
Nested JSON Facets (Subfacets)
Hi all, this is about using a function in nested facets, specifically the „sum()“ function inside a „terms“ facet using the json.facet api. My json.facet parameter looks like this: json.facet={shop_cat: {type:terms, field:shop_cat, facet: {cat_pop:"sum(popularity)"}}} A snippet of the result: "facets“: { "count":2508, "shop_cat“: { "buckets“: [{ "val“: "Men > Clothing > Jumpers & Cardigans", "count":252, "cat_pop“:0.0 }, { "val":"Men > Clothing > Jackets & Coats", "count":157, "cat_pop“:0.0 }, // and more This looks fine all over but it turns out that „cat_pop“, the result of „sum(popularity)“ is always 0.0 even if the documents for this facet value have popularities > 0. A quick check with this json.facet parameter: json.facet: {cat_pop:"sum(popularity)“} returns: "facets“: { "count":2508, "cat_pop":21.0}, To me, it seems it works fine on the base level but not when nested. Still, Yonik’s documentation and the Jira issues indicate that it is possible to use functions in nested facets so I might just be using the wrong structure? I have a hard time finding any other examples on the i-net and I had no luck changing the structure around. Could someone shed some light on this for me? It would also help to know if it is not possible to sum the values up this way. Thanks a lot! Chantal
Re: Find part of long query in shorter fields
Hi Ahmet! Thank you for that information. I was wondering whether dismax is kind of „deprecated“ or - if not - when would I use dismax in preference to edismax. The documentation sounds to me like „edismax is dismax+ : is does everything dismax does, and more“. Chantal Am 21.07.2016 um 14:43 schrieb Ahmet Arslan: > Hi, > > If you want to disable operators altogether please use dismax instead of > edismax. > In dismax, only + and - unary operators are supported, if i am not wrong. > I don't remember the situation of quotations for the phrase query. > > Ahmet >
Re: Find part of long query in shorter fields
Just for the records: After realizing that with „defType=dismax“ I really do get the expected output I’ve found out what I need to change in my edismax configuration: false Then this will work: > q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew > Charger > // edismax with qf/pf : „name“ and „brand“ field > Not returned anymore: > name: "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh > Formula Cartrige (Compatible with Series 7,5,3) 2 pc“ > brand: Braun > Best hit: > name: "Braun 9095cc Series 9 Electric Shaver“ > brand: Braun Actually, as I’d like to disable operators in the query altogether (if possible), I’m wondering whether I should not be using the old dismax in the first place. Cheers, Chantal
Re: Find part of long query in shorter fields
Hi Ahmet, thank you for the link. It helped me to find more resources. What I still don’t understand, though, is why the edismax returns one of the documents with a partial hit and not the other: q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger // edismax with qf/pf : „name“ and „brand“ field HIT: name: "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“ brand: Braun NOT A HIT: name: "Braun 9095cc Series 9 Electric Shaver“ brand: Braun (explainOther, schema, solrconfig for this, see my previous e-mail) I’m still thinking that if I could understand what is happening then it would help me figure out what the solution for my use case is. Maybe edismax would be perfectly fine with the right combination of fieldtypes and config values? Thanks for your input! Chantal
Find part of long query in shorter fields
Hello all, our index contains product offers from online shops. The fields we are indexing have all rather short values: the name of the product, the brand, the price, category and some fields containing identifiers like ASIN, GTIN etc. if available. We do not index the description texts. The regular user search uses the „edismax“ and queries the above mentioned fields which works fine for short inputs like „iphone 6s“. Now, we have to support a different kind of query which won’t be user input but using complete product names like those we store ourselves but not necessarily names that are actually part of our data set. This means that the input query can be relatively long. The output of the query is planned to consist of a More Like This list. So, in effect the query should have at least one hit that is hopefully close enough, and the actual result will be a More Like This list sourced by that one hit. I have tried to get this to work based on the „edismax“ setup for the regular user search but this does not work well when the input is longer than what we have stored as similar product. Here is an example: ## Step 1: Input (not stored in our index): "Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes) (a) This input does not produce any results with our current edismax config (details at the end of the e-mail). (b) When I relax the „mm“ parameter to "2<-1 5<-30% 8<10%“, I get one hit with the following name: => "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“ ## Step 2: When I reduce the input manually to the following: "Braun Series 9 9095CC Men's Electric Shaver“ The above shortened input returns a very good hit with the name: => "Braun 9095cc Series 9 Electric Shaver" My Question: Is it possible, and if so - how, to have the query input: "Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes) return (also or only) the hit with the name: => "Braun 9095cc Series 9 Electric Shaver" and maybe even give it a high score. I have tried to use „explainOther“ (output see at the end of this e-mail) but I have a really hard time reading it. In some cases, I’m not even able to understand where one clause ends and the next one starts (is it possible to have it returned in several lines?). Maybe someone can give me a hint on how to use that output or knows of some documentation on the i-net that explains how to make good use of it? Looking at the input string, I was wondering: (A) Is relaxing the „mm“ parameter really the way to go? (B) Should I create another name field in schema.xml that basically has a different query chain, discarding the last words of a query input if too long. Or maybe it’s possible to make tokens in the first part of the input more „important“ (though I’m not sure this is generally the case)? Should I remove some of the filters from the query chain (like the ShingleFilter)? (C) Can I configure something else or should I not use edismax for this? Thank you for reading this, any insight is highly appreciated! Chantal *** Following are the field configuration for the name field, the configuration of the edismax handler, and the output of „explainOther“ for the above example. SCHEMA.XML — „name" field: SOLRCONFIG.XML — MLT/EDISMAX all edismax *:* id,brand,name,price,score,popularity 0.1 brand_split^6 name brand_split^10 name^10 2-1 5-30% 810% 10 20 xml false brand_split^6 name price brand_split name price details DEBUG — EXPLAIN OTHER The „other“ document with id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b has the title "Braun 9095cc Series 9 Electric Shaver" edismax brand_split^6 name brand_split^10 name^10 2<-1 5<-30% 8<10% 10 20 0.1 Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc 773d4bdb341c4dc438c481ac80de5abde08d85bf Braun 97.122955 Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger (+(DisjunctionMaxQuery((name:braun | (brand_split:braun)^6.0)~0.1)