Hi, Pat

How is it going with groupby issuie?

2011/3/23 Pat Allan <[email protected]>

> And I should look at your code first before stating that - as you're doing
> that too. Damn, thought I had it figured out...
>
> Let me ponder some more.
>
> --
> Pat
>
> On 23/03/2011, at 11:06 AM, Pat Allan wrote:
>
> > Ah, I think I know what the problem is... if you look at the facet code
> in TS, you'll find both limit and max_matches are being set to either 1000
> (Sphinx's default) or the custom value (sometimes larger) so Sphinx looks at
> the widest possible set of results to figure out the values.
> >
> >
> https://github.com/freelancing-god/thinking-sphinx/blob/master/lib/thinking_sphinx/facet_search.rb#L59-60
> >
> > Try doing the same for your facet search - as it's possible Sphinx just
> isn't getting deep enough into the result set to find other combinations.
> >
> > --
> > Pat
> >
> > On 23/03/2011, at 10:55 AM, Viacheslav Dushin wrote:
> >
> >> Hi, Pat
> >>
> >> This indexer is very simplified version of thinking sphinx.
> >>
> >> facets method is in:
> >>
> >> mongoid-sphinx/lib/mongoid_sphinx/mongoid/sphinx.rb
> >>
> >>      def facets(*args)
> >>        options = args.extract_options!
> >>        query = args.join(" ")
> >>        MongoidSphinx::FacetSearch.new(query,self, options).facets
> >>      end
> >>
> >> Facet search creates array of bundle searches (facet_search.rb)
> >> Each search in this array is grouped_by different facet attribute
> >>    def search
> >>      return if class_name.facet_attributes.blank?
> >>      bundled_search = BundledSearch.new
> >>      class_name.facet_attributes.each do |attribute|
> >>        bundled_search.search(query, class_name,
> facet_search_options(attribute))
> >>      end
> >>      bundled_search
> >>    end
> >>
> >> after that all results are mapped in facet has
> >>
> >>    def facets
> >>      self.search.results
> >>       res = {}
> >>       self.search.results.each_with_index do |result, index|
> >>         attr_name = class_name.facet_attributes[index].to_s
> >>         res[attr_name] = result[:matches].map{|o|
> [o[:attributes][attr_name], o[:attributes]["@count"]]}.to_hash
> >>       end
> >>       res
> >>    end
> >>
> >>
> >> This code is based on Thinking Sphinx
> >>
> >> see full code in attachment
> >>
> >>
> >> Also I noted that group_by in thinking sphinx returns incorrect results
> too:
> >> Restaurant.search("pizza", :group_by=> :has_menu) -- returns only one
> result.
> >> debugging showed that:
> >> result[:matches].length == 1
> >> but when I run
> >> Restaurant.facets("pizza")
> >> debugging shows that
> >> results[0][:has_menu].length == 2
> >> this is correct
> >>
> >> As far as I understand, thinking sphinx uses group_by parameter, to calc
> facets. But why these results are different?
> >>
> >>
> >> Thanks
> >>
> >> 2011/3/23 Pat Allan <[email protected]>
> >> Hi Viacheslav
> >>
> >> Can you run me through the code you're using to make these queries? It
> does seem like something's wrong, but I need a bit more context.
> >>
> >> --
> >> Pat
> >>
> >> On 23/03/2011, at 2:38 AM, Viacheslav Dushin wrote:
> >>
> >>> Hello,
> >>>
> >>> I'm using latest version of Riddle from Github, Sphinx 0.9.9-release
> >>> (r2117) and xmlpipe2 as datasource for Sphinx.
> >>> I use group by to implement facets (similar to thinking sphinx, but
> >>> for xmlpipe2 datasource)
> >>> There is a problem: group by works incorrectly for "int", "bool" and
> >>> "multi" attributes, but it works ok float attributes.
> >>> Here is an example of output:
> >>>
> >>> grouping by has_menu -- bool:
> >>>
> >>>>> MongoRestaurant.facets("")[0]
> >>> => {:status=>0, :total_found=>1, :attribute_names=>["total_likes",
> >>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids",
> >>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort",
> >>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids",
> >>> "price_range", "score", "city_ids", "total_checkins",
> >>> "reviews_avg_score", "bh_id", "@groupby",
> >>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5,
> >>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1,
> >>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1,
> >>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1,
> >>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4,
> >>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1,
> >>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5,
> >>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}},
> :time=>0.0, :fields=>["classnamecrc32",
> >>> "name", "description", "offer_text", "offer_type_value",
> >>> "cuisine_name"], :matches=>[{:doc=>598000,
> :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101,
> >>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0,
> >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>70,
> >>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56,
> >>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69,
> >>> "name_sort"=>68, "lat"=>0.711285769939423,
> >>> "has_reservation"=>0}, :index=>0, :weight=>1273}], :total=>1}
> >>>
> >>> grouping by total_likes -- integer:
> >>>
> >>> => {:status=>0, :total_found=>1, :attribute_names=>["total_likes",
> >>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids",
> >>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort",
> >>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids",
> >>> "price_range", "score", "city_ids", "total_checkins",
> >>> "reviews_avg_score", "bh_id", "@groupby",
> >>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5,
> >>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1,
> >>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1,
> >>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1,
> >>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4,
> >>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1,
> >>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5,
> >>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}},
> :time=>0.001, :fields=>["classnamecrc32",
> >>> "name", "description", "offer_text", "offer_type_value",
> >>> "cuisine_name"], :matches=>[{:doc=>598000,
> :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101,
> >>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0,
> >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>70,
> >>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56,
> >>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69,
> >>> "name_sort"=>68, "lat"=>0.711285769939423,
> >>> "has_reservation"=>0}, :index=>0, :weight=>1273}], :total=>1}
> >>>
> >>>
> >>> grouping by reviews_avg_score -- float
> >>>
> >>> => {:status=>0, :total_found=>9, :attribute_names=>["total_likes",
> >>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids",
> >>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort",
> >>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids",
> >>> "price_range", "score", "city_ids", "total_checkins",
> >>> "reviews_avg_score", "bh_id", "@groupby",
> >>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5,
> >>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1,
> >>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1,
> >>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1,
> >>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4,
> >>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1,
> >>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5,
> >>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}},
> :time=>0.001, :fields=>["classnamecrc32",
> >>> "name", "description", "offer_text", "offer_type_value",
> >>> "cuisine_name"], :matches=>[{:doc=>598261,
> :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29073655605316, "offer_ids"=>[], "@groupby"=>20040816,
> >>> "restaurant_ids"=>598261, "source_ids"=>[], "reviews_avg_score"=>10.0,
> >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>5,
> >>> "address_zipcode"=>10028, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>1, "cuisine_ids"=>[21], "neighborhood_ids"=>[8, 22, 23,
> >>> 57], "bh_id"=>598261, "score"=>0.0, "price_range"=>69,
> >>> "name_sort"=>64, "lat"=>0.711645185947418,
> >>> "has_reservation"=>0}, :index=>0, :weight=>1273},
> >>> {:doc=>598904, :attributes=>{"offer_type_ids"=>[11],
> >>> "lon"=>-1.29076039791107, "offer_ids"=>[622122], "@groupby"=>20040804,
> >>> "restaurant_ids"=>598904, "source_ids"=>[15],
> >>> "reviews_avg_score"=>9.0, "total_checkins"=>13, "total_likes"=>1,
> >>> "@count"=>3, "address_zipcode"=>10021, "has_delivery"=>1,
> >>> "city_ids"=>18819, "has_menu"=>0, "cuisine_ids"=>[],
> >>> "neighborhood_ids"=>[22, 23, 28, 57], "bh_id"=>598904,
> >>> "score"=>2.4300000667572, "price_range"=>69, "name_sort"=>26,
> >>> "lat"=>0.711563467979431,
> >>> "has_reservation"=>0}, :index=>1, :weight=>1273},
> >>> {:doc=>598488, :attributes=>{"offer_type_ids"=>[], "lon"=>0.0,
> >>> "offer_ids"=>[], "@groupby"=>20040722, "restaurant_ids"=>598488,
> >>> "source_ids"=>[], "reviews_avg_score"=>8.0, "total_checkins"=>0,
> >>> "total_likes"=>0, "@count"=>3, "address_zipcode"=>10012,
> >>> "has_delivery"=>1, "city_ids"=>18819, "has_menu"=>1,
> >>> "cuisine_ids"=>[33], "neighborhood_ids"=>[2, 9, 22, 61],
> >>> "bh_id"=>598488, "score"=>0.0, "price_range"=>69, "name_sort"=>37,
> >>> "lat"=>0.0, "has_reservation"=>0}, :index=>2, :weight=>1273},
> >>> {:doc=>599149, :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29116952419281, "offer_ids"=>[], "@groupby"=>20040628,
> >>> "restaurant_ids"=>599149, "source_ids"=>[], "reviews_avg_score"=>7.0,
> >>> "total_checkins"=>22, "total_likes"=>0, "@count"=>2,
> >>> "address_zipcode"=>10017, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>0, "cuisine_ids"=>[28], "neighborhood_ids"=>[6, 22, 60],
> >>> "bh_id"=>599149, "score"=>0.0, "price_range"=>69, "name_sort"=>45,
> >>> "lat"=>0.711319506168365,
> >>> "has_reservation"=>0}, :index=>3, :weight=>1273},
> >>> {:doc=>598304, :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.52945172786713, "offer_ids"=>[], "@groupby"=>20040604,
> >>> "restaurant_ids"=>598304, "source_ids"=>[], "reviews_avg_score"=>6.0,
> >>> "total_checkins"=>115, "total_likes"=>0, "@count"=>5,
> >>> "address_zipcode"=>60604, "has_delivery"=>1, "city_ids"=>6335,
> >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[240],
> >>> "bh_id"=>598304, "score"=>0.0, "price_range"=>69, "name_sort"=>0,
> >>> "lat"=>0.73090934753418,
> >>> "has_reservation"=>0}, :index=>4, :weight=>1273},
> >>> {:doc=>598791, :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29123413562775, "offer_ids"=>[], "@groupby"=>20040511,
> >>> "restaurant_ids"=>598791, "source_ids"=>[], "reviews_avg_score"=>5.0,
> >>> "total_checkins"=>12, "total_likes"=>6, "@count"=>1,
> >>> "address_zipcode"=>10018, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>1, "cuisine_ids"=>[2], "neighborhood_ids"=>[7, 22, 60],
> >>> "bh_id"=>598791, "score"=>0.0, "price_range"=>70, "name_sort"=>7,
> >>> "lat"=>0.711277902126312,
> >>> "has_reservation"=>0}, :index=>5, :weight=>1273},
> >>> {:doc=>598474, :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.52936661243439, "offer_ids"=>[], "@groupby"=>20040416,
> >>> "restaurant_ids"=>598474, "source_ids"=>[], "reviews_avg_score"=>4.0,
> >>> "total_checkins"=>925, "total_likes"=>5, "@count"=>1,
> >>> "address_zipcode"=>60611, "has_delivery"=>1, "city_ids"=>6335,
> >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[217, 287],
> >>> "bh_id"=>598474, "score"=>0.0, "price_range"=>71, "name_sort"=>10,
> >>> "lat"=>0.731164395809174,
> >>> "has_reservation"=>0}, :index=>6, :weight=>1273},
> >>> {:doc=>598689, :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29161155223846, "offer_ids"=>[], "@groupby"=>20040110,
> >>> "restaurant_ids"=>598689, "source_ids"=>[], "reviews_avg_score"=>2.0,
> >>> "total_checkins"=>2, "total_likes"=>0, "@count"=>2,
> >>> "address_zipcode"=>10007, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[2, 17, 22, 65],
> >>> "bh_id"=>598689, "score"=>0.0, "price_range"=>69, "name_sort"=>15,
> >>> "lat"=>0.71059387922287,
> >>> "has_reservation"=>0}, :index=>7, :weight=>1273},
> >>> {:doc=>598000, :attributes=>{"offer_type_ids"=>[],
> >>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101,
> >>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0,
> >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>48,
> >>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819,
> >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56,
> >>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69,
> >>> "name_sort"=>68, "lat"=>0.711285769939423,
> >>> "has_reservation"=>0}, :index=>8, :weight=>1273}], :total=>9}
> >>>
> >>>
> >>>
> >>> You can easily note that grouping by :has_menu and :total_likes
> >>> returns only one result (:total_found=>1). It is incorrect: there are
> >>> records with :has_menu == false, total_likes = 1, total_likes =2 etc.
> >>> Only group by reviews_avg_score returns correct results
> >>>
> >>>
> >>> Example of xml data source:
> >>> <?xml version="1.0" encoding="utf-8"?>
> >>> <sphinx:docset>
> >>> <sphinx:schema>
> >>> <sphinx:field name="classnamecrc32"/>
> >>> <sphinx:field name="name"/>
> >>> <sphinx:field name="description"/>
> >>> <sphinx:field name="offer_text"/>
> >>> <sphinx:field name="offer_type_value"/>
> >>> <sphinx:field name="cuisine_name"/>
> >>> <sphinx:attr name="address_zipcode" type="int"/>
> >>> <sphinx:attr name="restaurant_ids" type="int"/>
> >>> <sphinx:attr name="lat" type="float"/>
> >>> <sphinx:attr name="has_delivery" type="bool"/>
> >>> <sphinx:attr name="source_ids" type="multi"/>
> >>> <sphinx:attr name="lon" type="float"/>
> >>> <sphinx:attr name="has_reservation" type="bool"/>
> >>> <sphinx:attr name="offer_type_ids" type="multi"/>
> >>> <sphinx:attr name="price_range" type="str2ordinal"/>
> >>> <sphinx:attr name="has_menu" type="bool"/>
> >>> <sphinx:attr name="score" type="float"/>
> >>> <sphinx:attr name="neighborhood_ids" type="multi"/>
> >>> <sphinx:attr name="cuisine_ids" type="multi"/>
> >>> <sphinx:attr name="total_checkins" type="int"/>
> >>> <sphinx:attr name="offer_ids" type="multi"/>
> >>> <sphinx:attr name="reviews_avg_score" type="float"/>
> >>> <sphinx:attr name="city_ids" type="int"/>
> >>> <sphinx:attr name="name_sort" type="str2ordinal"/>
> >>> <sphinx:attr name="total_likes" type="int"/>
> >>> <sphinx:attr name="bh_id" type="int"/>
> >>> </sphinx:schema>
> >>> <sphinx:document id="599105">
> >>> <classnamecrc32>400456007</classnamecrc32>
> >>> <name><![CDATA[Subway]]></name>
> >>> <description><![CDATA[test]]></description>
> >>> <offer_text><![CDATA[]]></offer_text>
> >>> <offer_type_value><![CDATA[]]></offer_type_value>
> >>> <cuisine_name><![CDATA[]]></cuisine_name>
> >>> <address_zipcode>60622</address_zipcode>
> >>> <restaurant_ids>599105</restaurant_ids>
> >>> <lat>0.731224661851994</lat>
> >>> <has_delivery>1</has_delivery>
> >>> <source_ids></source_ids>
> >>> <lon>-1.53024979754365</lon>
> >>> <has_reservation>0</has_reservation>
> >>> <offer_type_ids></offer_type_ids>
> >>> <price_range>1</price_range>
> >>> <has_menu>1</has_menu>
> >>> <score>0.0</score>
> >>> <neighborhood_ids>201,202,284</neighborhood_ids>
> >>> <cuisine_ids></cuisine_ids>
> >>> <total_checkins>5</total_checkins>
> >>> <offer_ids></offer_ids>
> >>> <reviews_avg_score>0</reviews_avg_score>
> >>> <city_ids>6335</city_ids>
> >>> <name_sort>Subway</name_sort>
> >>> <total_likes>0</total_likes>
> >>> <bh_id>599105</bh_id>
> >>> </sphinx:document>
> >>> </sphinx:docset>
> >>>
> >>>
> >>> Thanks, Slava
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups "Thinking Sphinx" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to
> [email protected].
> >>> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
> >>>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "Thinking Sphinx" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> [email protected].
> >> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
> >>
> >>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "Thinking Sphinx" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> [email protected].
> >> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
> >> <mongoid-sphinx.zip>
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to