Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread CA
Hi Yonik,

thank you for your quick reply.

(((I just send my original e-mail a second time (I did not confirm the 
subscription so I thought it might not have been send the first time, I’m 
sorry.

We are using SOLR 6.1.0. Sorry, I should have mentioned.

The low number is because of the test data. It’s not how it would look like in 
production. That’s also why I was never wondering about 0 values in the 
beginning. But now that I have tweaked the data I can see that it’s not 
returning the values as it should. And in production there are values > 0 as 
expected but the sum() returns 0 nevertheless, that’s why we are aware that 
something is wrong.

In production the data is re-indexed constantly. Though, we might have changed 
the field type from int to float. I’m not sure whether we have really 
re-indexed from scratch after that, in production, but I think in my local env 
I did re-create the index. I will check this out.

I’ll also play around with the range query, thanks for the tip!

Cheers,
Chantal



> That should work... what version of Solr are you using?  Did you 
> change the type of the popularity field w/o completely reindexing? 
> 
> You can try to verify the number of documents in each bucket that have 
> the popularity field by adding another sub-facet next to cat_pop: 
> num_pop:{query:"popularity:[* TO *]"} 
> 
> > A quick check with this json.facet parameter: 
> > 
> > json.facet: {cat_pop:"sum(popularity)“} 
> > 
> > returns: 
> > 
> > "facets“: { 
> > "count":2508, 
> > "cat_pop":21.0}, 
> 
> That looks like a pretty low sum for all those documents perhaps 
> most of them are missing "popularity" (or have a 0 popularity). 
> To test one of the buckets at the top-level this way, you could add 
> fq=shop_cat:"Men > Clothing > Jumpers & Cardigans" 
> and see if you get anything. 
> 
> -Yonik 



Nested JSON Facets (Subfacets)

2016-12-15 Thread CA
Hi all,

this is about using a function in nested facets, specifically the „sum()“ 
function inside a „terms“ facet using the json.facet api.

My json.facet parameter looks like this:

   json.facet={shop_cat: {type:terms, field:shop_cat, facet: 
{cat_pop:"sum(popularity)"}}}

A snippet of the result:

   "facets“: {
   "count":2508,
   "shop_cat“: {
   "buckets“: [{
   "val“: "Men > Clothing > Jumpers & Cardigans",
   "count":252,
   "cat_pop“:0.0
}, {
  "val":"Men > Clothing > Jackets & Coats",
  "count":157,
  "cat_pop“:0.0
}, // and more

This looks fine all over but it turns out that „cat_pop“, the result of 
„sum(popularity)“ is always 0.0 even if the documents for this facet value have 
popularities > 0.

A quick check with this json.facet parameter:

   json.facet: {cat_pop:"sum(popularity)“}

returns:

   "facets“: {
   "count":2508,
   "cat_pop":21.0},

To me, it seems it works fine on the base level but not when nested. Still, 
Yonik’s documentation and the Jira issues indicate that it is possible to use 
functions in nested facets so I might just be using the wrong structure? I have 
a hard time finding any other examples on the i-net and I had no luck changing 
the structure around.
Could someone shed some light on this for me? It would also help to know if it 
is not possible to sum the values up this way.

Thanks a lot!
Chantal




Nested JSON Facets (Subfacets)

2016-12-14 Thread CA
Hi all,

this is about using a function in nested facets, specifically the „sum()“ 
function inside a „terms“ facet using the json.facet api.

My json.facet parameter looks like this:

json.facet={shop_cat: {type:terms, field:shop_cat, facet: 
{cat_pop:"sum(popularity)"}}}

A snippet of the result:

"facets“: {
"count":2508,
"shop_cat“: {
"buckets“: [{
"val“: "Men > Clothing > Jumpers & Cardigans",
"count":252,
"cat_pop“:0.0
 }, {
   "val":"Men > Clothing > Jackets & Coats",
   "count":157,
   "cat_pop“:0.0
 }, // and more

This looks fine all over but it turns out that „cat_pop“, the result of 
„sum(popularity)“ is always 0.0 even if the documents for this facet value have 
popularities > 0.

A quick check with this json.facet parameter:

json.facet: {cat_pop:"sum(popularity)“}

returns:

"facets“: {
"count":2508,
"cat_pop":21.0},

To me, it seems it works fine on the base level but not when nested. Still, 
Yonik’s documentation and the Jira issues indicate that it is possible to use 
functions in nested facets so I might just be using the wrong structure? I have 
a hard time finding any other examples on the i-net and I had no luck changing 
the structure around.
Could someone shed some light on this for me? It would also help to know if it 
is not possible to sum the values up this way.

Thanks a lot!
Chantal




Re: Find part of long query in shorter fields

2016-07-21 Thread CA
Hi Ahmet!

Thank you for that information. I was wondering whether dismax is kind of 
„deprecated“ or - if not - when would I use dismax in preference to edismax.
The documentation sounds to me like „edismax is dismax+ : is does everything 
dismax does, and more“.

Chantal

Am 21.07.2016 um 14:43 schrieb Ahmet Arslan :

> Hi,
> 
> If you want to disable operators altogether please use dismax instead of 
> edismax.
> In dismax, only + and - unary operators are supported, if i am not wrong.
> I don't remember the situation of quotations for the phrase query.
> 
> Ahmet
> 



Re: Find part of long query in shorter fields

2016-07-19 Thread CA
Just for the records:

After realizing that with „defType=dismax“ I really do get the expected output 
I’ve found out what I need to change in my edismax configuration:

false

Then this will work:
> q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
> Charger
> // edismax with qf/pf : „name“ and „brand“ field
> 
Not returned anymore:
> name: "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh 
> Formula Cartrige (Compatible with Series 7,5,3) 2 pc“
> brand: Braun
> 
Best hit:
> name: "Braun 9095cc Series 9 Electric Shaver“
> brand: Braun


Actually, as I’d like to disable operators in the query altogether (if 
possible), I’m wondering whether I should not be using the old dismax in the 
first place.

Cheers,
Chantal

Re: Find part of long query in shorter fields

2016-07-18 Thread CA
Hi Ahmet,


thank you for the link. It helped me to find more resources.

What I still don’t understand, though, is why the edismax returns one of the 
documents with a partial hit and not the other:


q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
Charger
// edismax with qf/pf : „name“ and „brand“ field

HIT:
name: "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh 
Formula Cartrige (Compatible with Series 7,5,3) 2 pc“
brand: Braun

NOT A HIT:
name: "Braun 9095cc Series 9 Electric Shaver“
brand: Braun

(explainOther, schema, solrconfig for this, see my previous e-mail)


I’m still thinking that if I could understand what is happening then it would 
help me figure out what the solution for my use case is. Maybe edismax would be 
perfectly fine with the right combination of fieldtypes and config values?


Thanks for your input!
Chantal






Find part of long query in shorter fields

2016-07-16 Thread CA
Hello all,

our index contains product offers from online shops. The fields we are indexing 
have all rather short values: the name of the product, the brand, the price, 
category and some fields containing identifiers like ASIN, GTIN etc. if 
available. We do not index the description texts.

The regular user search uses the „edismax“ and queries the above mentioned 
fields which works fine for short inputs like „iphone 6s“.

Now, we have to support a different kind of query which won’t be user input but 
using complete product names like those we store ourselves but not necessarily 
names that are actually part of our data set. This means that the input query 
can be relatively long. The output of the query is planned to consist of a More 
Like This list. So, in effect the query should have at least one hit that is 
hopefully close enough, and the actual result will be a More Like This list 
sourced by that one hit.

I have tried to get this to work based on the „edismax“ setup for the regular 
user search but this does not work well when the input is longer than what we 
have stored as similar product. Here is an example:


## Step 1: Input (not stored in our index):
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
Charger“ (input to edismax without quotes)

(a) This input does not produce any results with our current edismax config 
(details at the end of the e-mail).
(b) When I relax the „mm“ parameter to "2<-1 5<-30% 8<10%“, I get one hit with 
the following name:
=> "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh Formula 
Cartrige (Compatible with Series 7,5,3) 2 pc“


## Step 2: When I reduce the input manually to the following:
"Braun Series 9 9095CC Men's Electric Shaver“

The above shortened input returns a very good hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"


My Question:

Is it possible, and if so - how, to have the query input:
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
Charger“ (input to edismax without quotes)
return (also or only) the hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"
and maybe even give it a high score.

I have tried to use „explainOther“ (output see at the end of this e-mail) but I 
have a really hard time reading it. In some cases, I’m not even able to 
understand where one clause ends and the next one starts (is it possible to 
have it returned in several lines?). Maybe someone can give me a hint on how to 
use that output or knows of some documentation on the i-net that explains how 
to make good use of it?


Looking at the input string, I was wondering:

(A) Is relaxing the „mm“ parameter really the way to go?
(B) Should I create another name field in schema.xml that basically has a 
different query chain, discarding the last words of a query input if too long. 
Or maybe it’s possible to make tokens in the first part of the input more 
„important“ (though I’m not sure this is generally the case)? Should I remove 
some of the filters from the query chain (like the ShingleFilter)?
(C) Can I configure something else or should I not use edismax for this?


Thank you for reading this,
any insight is highly appreciated!

Chantal


***

Following are the field configuration for the name field, the configuration of 
the edismax handler, and the output of „explainOther“ for the above example.



SCHEMA.XML — „name" field:














SOLRCONFIG.XML — MLT/EDISMAX

 
 
 all
 edismax

 *:*
 id,brand,name,price,score,popularity
 0.1
 brand_split^6 name
 brand_split^10 name^10
 2-1 5-30% 810%
 10
 20

 xml

 false
 brand_split^6 name price
 brand_split name price
 details
 
 



DEBUG — EXPLAIN OTHER

The „other“ document with id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b has the 
title "Braun 9095cc Series 9 Electric Shaver"



edismax
brand_split^6 name
brand_split^10 name^10
2<-1 5<-30% 8<10%
10
20
0.1

Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean 
and Renew Charger

id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b





Braun Series Clean CCR2 Cleansing Dock Cartridges 
Lemonfresh Formula Cartrige (Compatible with
Series 7,5,3) 2 pc

773d4bdb341c4dc438c481ac80de5abde08d85bf
Braun
97.122955




Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and 
Renew Charger


Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and 
Renew Charger


(+(DisjunctionMaxQuery((name:braun | (brand_split:braun)^6.0)~0.1)