Re: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

Alisa Z . Mon, 28 Mar 2016 14:20:32 -0700

 Ok, so for the 1st question, I think I'm getting closer:  adding  facet: 
{top_terms_by_doc: "unique(_root_)"}  as indicated in  
http://blog.griddynamics.com/search/label/~Mikhail%20Khludnev returns correct 
counts. However, sorting is done by the upper faceting not by the 
unique(_root_):



curl  http://localhost:8985/solr/my_collection /query -d 
'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
json.facet={
  filter_by_child_type :{
    type:query,
    q:"type_s:doc.enriched.text.keywords",
    domain: { blockChildren : "type_s:doc" },
    facet:{
      top_keywords_text : {
        type: terms,
        field: text_t,
        limit: 10,
        facet: {
           top_terms_by_doc: "unique(_root_)"
         }
      }
    }
  }
}'

RETURNS 

{
  "responseHeader":{
    "status":0,
    "QTime":25,
    "params":{
      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
+Subject_t:california",
      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    
q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren : 
\"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type: 
terms,\n        field: text_t,\n        limit: 10,\n        facet: {\n          
 top_terms_by_doc: \"unique(_root_)\"\n         }\n      }\n    }\n  }\n}",
      "rows":"0"}},
  "response":{"numFound":19,"start":0,"docs":[]
  },
  "facets":{
    "count":19,
    "filter_by_child_type":{
      "count":686,
      "top_keywords_text":{
        "buckets":[{
            "val":"enron",
            "count":57,
            "top_terms_by_doc":9},
          {
            "val":"california",
            "count":22,
            "top_terms_by_doc":13},
          {
            "val":"power",
            "count":21,
            "top_terms_by_doc":7},
          {
            "val":"rate",
            "count":15,
            "top_terms_by_doc":5},
          {
            "val":"plan",
            "count":13,
            "top_terms_by_doc":3},
          {
            "val":"hou",
            "count":12,
            "top_terms_by_doc":5},
          {
            "val":"energy",
            "count":11,
            "top_terms_by_doc":5},
          {
            "val":"na",
            "count":11,
            "top_terms_by_doc":5},
          {
            "val":"mckinsey",
            "count":10,
            "top_terms_by_doc":1},
          {
            "val":"socal",
            "count":10,
            "top_terms_by_doc":4}]}}}}

Nice, but I want them to be ordered by "top_terms_by_doc" frequencies,  not by 
the "count" frequencies. 
Any suggestions?

Thanks,
Alisa 





>Понедельник, 28 марта 2016, 15:39 -04:00 от Alisa Z. <prol...@mail.ru>:
>
>Hi all, 
>
>I am trying to perform faceting of parent docs by nested document fields. I've 
>tried 2 approaches as in subject, yet in first the results are not quite 
>correct and in the 2nd I cannot get the query right. So I need help on either 
>of them and any explication or documentation or blogs on the behavior is much 
>appreciated.   
>
>Verbally the query is as follows: "Find top 10 keywords for all documents with 
>"california" in email subject line"
>
>Here is the query with responses: 
>
>==== Json Facet API ====  
>
>curl http://localhost:8985/solr/my_collection/query -d 
>'q={!parent%20which="type_s:doc"}type_s:doc.userData%20%2BSubject_t:california&rows=0&
>json.facet={
>  filter_by_child_type :{
>    type:query,
>    q:"type_s:doc.enriched.text.keywords",
>    domain: { blockChildren : "type_s:doc" },
>    facet:{
>      top_keywords_text : {
>        type: terms,
>        field: text_t,
>        limit: 10
>      }
>    }
>  }
>}'
>
>RETURNS:  
>
>{
>  "responseHeader":{
>    "status":0,
>    "QTime":134,
>    "params":{
>      "q":"{!parent which=\"type_s:doc\"}type_s:doc.userData 
>+Subject_t:california",
>      "json.facet":"{\n  filter_by_child_type :{\n    type:query,\n    
>q:\"type_s:doc.enriched.text.keywords\",\n    domain: { blockChildren : 
>\"type_s:doc\" },\n    facet:{\n      top_keywords_text : {\n        type: 
>terms,\n        field: text_t,\n        limit: 10\n      }\n    }\n  }\n}",
>      "rows":"0"}},
>  "response":{"numFound":19,"start":0,"docs":[]
>  },
>  "facets":{
>    "count":19,
>    "filter_by_child_type":{
>      "count":686,
>      "top_keywords_text":{
>        "buckets":[{
>            "val":"enron",
>            "count":57},
>          {
>            "val":"california",
>            "count":22},
>          {
>            "val":"power",
>            "count":21},
>          {
>            "val":"rate",
>            "count":15},
>          {
>            "val":"plan",
>            "count":13},
>          {
>            "val":"hou",
>            "count":12},
>          {
>            "val":"energy",
>            "count":11},
>          {
>            "val":"na",
>            "count":11},
>          {
>            "val":"mckinsey",
>            "count":10},
>          {
>            "val":"socal",
>            "count":10}]}}}}
>
>
>QUESTION:  where do the counts greater than 19 (the total number of the 
>top-level documents returned by the query) comes from?  How to adjust the 
>query to facet only on the top-level documents (and consequently no count 
>should be greater than 19)? 
>
>
>===== BlockJoin Faceting ====== 
>Following the example on  
>https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting , I've 
>tried this:  
>
>/bjqfacet?q={!parent%20which=type_s:doc}type_s:doc.enriched.text.keywords&child.facet.field=text_t&child.facet.limit=10&child.facet.mincount=5&rows=0&fq={!parent%20which=type_s:doc}type_s:doc.userData%20%2BSubject_t:california&wt=json&indent=true
>
>RETURNS: 
>
>{
>  "responseHeader":{
>    "status":0,
>    "QTime":1},
>  "response":{"numFound":19,"start":0,"docs":[]
>  },
>  "facet_counts":[
>    "facet_fields",[
>      "text_t",[
>        "128x",1,
>        "18xx",1,
>        "1x",1,
>        "2",2,
>        "30",1,
>        "60",1,
>        "78xx",1,
>        "82xx",1,
>        "ab",2,
>        "access",5,
>        "account",1,
>        "accounts",1,
>...
>"california",13,
>...
>"enron",9,
>...
>]]]}
>
>QUESTION: This looks very close to what I want, yet why  
>child.facet.limit=10&child.facet.mincount=5 are ignored?  How to get top 10 
>most frequent? 
>
>
>Thank you for your help in advance! 
>
>-- 
>Alisa Zhila

Re: [nesting] JSON Facet API vs. BlockJoin Faceting: need help on queries (Facet API facets by wrong doc level VS. BlockJoin Faceting does not return top 10 most frequent)

Reply via email to