ok do you know, why it works ok, if i define attribute as "float"?
2011/3/23 Pat Allan <[email protected]> > And I should look at your code first before stating that - as you're doing > that too. Damn, thought I had it figured out... > > Let me ponder some more. > > -- > Pat > > On 23/03/2011, at 11:06 AM, Pat Allan wrote: > > > Ah, I think I know what the problem is... if you look at the facet code > in TS, you'll find both limit and max_matches are being set to either 1000 > (Sphinx's default) or the custom value (sometimes larger) so Sphinx looks at > the widest possible set of results to figure out the values. > > > > > https://github.com/freelancing-god/thinking-sphinx/blob/master/lib/thinking_sphinx/facet_search.rb#L59-60 > > > > Try doing the same for your facet search - as it's possible Sphinx just > isn't getting deep enough into the result set to find other combinations. > > > > -- > > Pat > > > > On 23/03/2011, at 10:55 AM, Viacheslav Dushin wrote: > > > >> Hi, Pat > >> > >> This indexer is very simplified version of thinking sphinx. > >> > >> facets method is in: > >> > >> mongoid-sphinx/lib/mongoid_sphinx/mongoid/sphinx.rb > >> > >> def facets(*args) > >> options = args.extract_options! > >> query = args.join(" ") > >> MongoidSphinx::FacetSearch.new(query,self, options).facets > >> end > >> > >> Facet search creates array of bundle searches (facet_search.rb) > >> Each search in this array is grouped_by different facet attribute > >> def search > >> return if class_name.facet_attributes.blank? > >> bundled_search = BundledSearch.new > >> class_name.facet_attributes.each do |attribute| > >> bundled_search.search(query, class_name, > facet_search_options(attribute)) > >> end > >> bundled_search > >> end > >> > >> after that all results are mapped in facet has > >> > >> def facets > >> self.search.results > >> res = {} > >> self.search.results.each_with_index do |result, index| > >> attr_name = class_name.facet_attributes[index].to_s > >> res[attr_name] = result[:matches].map{|o| > [o[:attributes][attr_name], o[:attributes]["@count"]]}.to_hash > >> end > >> res > >> end > >> > >> > >> This code is based on Thinking Sphinx > >> > >> see full code in attachment > >> > >> > >> Also I noted that group_by in thinking sphinx returns incorrect results > too: > >> Restaurant.search("pizza", :group_by=> :has_menu) -- returns only one > result. > >> debugging showed that: > >> result[:matches].length == 1 > >> but when I run > >> Restaurant.facets("pizza") > >> debugging shows that > >> results[0][:has_menu].length == 2 > >> this is correct > >> > >> As far as I understand, thinking sphinx uses group_by parameter, to calc > facets. But why these results are different? > >> > >> > >> Thanks > >> > >> 2011/3/23 Pat Allan <[email protected]> > >> Hi Viacheslav > >> > >> Can you run me through the code you're using to make these queries? It > does seem like something's wrong, but I need a bit more context. > >> > >> -- > >> Pat > >> > >> On 23/03/2011, at 2:38 AM, Viacheslav Dushin wrote: > >> > >>> Hello, > >>> > >>> I'm using latest version of Riddle from Github, Sphinx 0.9.9-release > >>> (r2117) and xmlpipe2 as datasource for Sphinx. > >>> I use group by to implement facets (similar to thinking sphinx, but > >>> for xmlpipe2 datasource) > >>> There is a problem: group by works incorrectly for "int", "bool" and > >>> "multi" attributes, but it works ok float attributes. > >>> Here is an example of output: > >>> > >>> grouping by has_menu -- bool: > >>> > >>>>> MongoRestaurant.facets("")[0] > >>> => {:status=>0, :total_found=>1, :attribute_names=>["total_likes", > >>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids", > >>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort", > >>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids", > >>> "price_range", "score", "city_ids", "total_checkins", > >>> "reviews_avg_score", "bh_id", "@groupby", > >>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5, > >>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1, > >>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1, > >>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1, > >>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4, > >>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1, > >>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5, > >>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}}, > :time=>0.0, :fields=>["classnamecrc32", > >>> "name", "description", "offer_text", "offer_type_value", > >>> "cuisine_name"], :matches=>[{:doc=>598000, > :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101, > >>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0, > >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>70, > >>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56, > >>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69, > >>> "name_sort"=>68, "lat"=>0.711285769939423, > >>> "has_reservation"=>0}, :index=>0, :weight=>1273}], :total=>1} > >>> > >>> grouping by total_likes -- integer: > >>> > >>> => {:status=>0, :total_found=>1, :attribute_names=>["total_likes", > >>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids", > >>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort", > >>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids", > >>> "price_range", "score", "city_ids", "total_checkins", > >>> "reviews_avg_score", "bh_id", "@groupby", > >>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5, > >>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1, > >>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1, > >>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1, > >>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4, > >>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1, > >>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5, > >>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}}, > :time=>0.001, :fields=>["classnamecrc32", > >>> "name", "description", "offer_text", "offer_type_value", > >>> "cuisine_name"], :matches=>[{:doc=>598000, > :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101, > >>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0, > >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>70, > >>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56, > >>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69, > >>> "name_sort"=>68, "lat"=>0.711285769939423, > >>> "has_reservation"=>0}, :index=>0, :weight=>1273}], :total=>1} > >>> > >>> > >>> grouping by reviews_avg_score -- float > >>> > >>> => {:status=>0, :total_found=>9, :attribute_names=>["total_likes", > >>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids", > >>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort", > >>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids", > >>> "price_range", "score", "city_ids", "total_checkins", > >>> "reviews_avg_score", "bh_id", "@groupby", > >>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5, > >>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1, > >>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1, > >>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1, > >>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4, > >>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1, > >>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5, > >>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}}, > :time=>0.001, :fields=>["classnamecrc32", > >>> "name", "description", "offer_text", "offer_type_value", > >>> "cuisine_name"], :matches=>[{:doc=>598261, > :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29073655605316, "offer_ids"=>[], "@groupby"=>20040816, > >>> "restaurant_ids"=>598261, "source_ids"=>[], "reviews_avg_score"=>10.0, > >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>5, > >>> "address_zipcode"=>10028, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>1, "cuisine_ids"=>[21], "neighborhood_ids"=>[8, 22, 23, > >>> 57], "bh_id"=>598261, "score"=>0.0, "price_range"=>69, > >>> "name_sort"=>64, "lat"=>0.711645185947418, > >>> "has_reservation"=>0}, :index=>0, :weight=>1273}, > >>> {:doc=>598904, :attributes=>{"offer_type_ids"=>[11], > >>> "lon"=>-1.29076039791107, "offer_ids"=>[622122], "@groupby"=>20040804, > >>> "restaurant_ids"=>598904, "source_ids"=>[15], > >>> "reviews_avg_score"=>9.0, "total_checkins"=>13, "total_likes"=>1, > >>> "@count"=>3, "address_zipcode"=>10021, "has_delivery"=>1, > >>> "city_ids"=>18819, "has_menu"=>0, "cuisine_ids"=>[], > >>> "neighborhood_ids"=>[22, 23, 28, 57], "bh_id"=>598904, > >>> "score"=>2.4300000667572, "price_range"=>69, "name_sort"=>26, > >>> "lat"=>0.711563467979431, > >>> "has_reservation"=>0}, :index=>1, :weight=>1273}, > >>> {:doc=>598488, :attributes=>{"offer_type_ids"=>[], "lon"=>0.0, > >>> "offer_ids"=>[], "@groupby"=>20040722, "restaurant_ids"=>598488, > >>> "source_ids"=>[], "reviews_avg_score"=>8.0, "total_checkins"=>0, > >>> "total_likes"=>0, "@count"=>3, "address_zipcode"=>10012, > >>> "has_delivery"=>1, "city_ids"=>18819, "has_menu"=>1, > >>> "cuisine_ids"=>[33], "neighborhood_ids"=>[2, 9, 22, 61], > >>> "bh_id"=>598488, "score"=>0.0, "price_range"=>69, "name_sort"=>37, > >>> "lat"=>0.0, "has_reservation"=>0}, :index=>2, :weight=>1273}, > >>> {:doc=>599149, :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29116952419281, "offer_ids"=>[], "@groupby"=>20040628, > >>> "restaurant_ids"=>599149, "source_ids"=>[], "reviews_avg_score"=>7.0, > >>> "total_checkins"=>22, "total_likes"=>0, "@count"=>2, > >>> "address_zipcode"=>10017, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>0, "cuisine_ids"=>[28], "neighborhood_ids"=>[6, 22, 60], > >>> "bh_id"=>599149, "score"=>0.0, "price_range"=>69, "name_sort"=>45, > >>> "lat"=>0.711319506168365, > >>> "has_reservation"=>0}, :index=>3, :weight=>1273}, > >>> {:doc=>598304, :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.52945172786713, "offer_ids"=>[], "@groupby"=>20040604, > >>> "restaurant_ids"=>598304, "source_ids"=>[], "reviews_avg_score"=>6.0, > >>> "total_checkins"=>115, "total_likes"=>0, "@count"=>5, > >>> "address_zipcode"=>60604, "has_delivery"=>1, "city_ids"=>6335, > >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[240], > >>> "bh_id"=>598304, "score"=>0.0, "price_range"=>69, "name_sort"=>0, > >>> "lat"=>0.73090934753418, > >>> "has_reservation"=>0}, :index=>4, :weight=>1273}, > >>> {:doc=>598791, :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29123413562775, "offer_ids"=>[], "@groupby"=>20040511, > >>> "restaurant_ids"=>598791, "source_ids"=>[], "reviews_avg_score"=>5.0, > >>> "total_checkins"=>12, "total_likes"=>6, "@count"=>1, > >>> "address_zipcode"=>10018, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>1, "cuisine_ids"=>[2], "neighborhood_ids"=>[7, 22, 60], > >>> "bh_id"=>598791, "score"=>0.0, "price_range"=>70, "name_sort"=>7, > >>> "lat"=>0.711277902126312, > >>> "has_reservation"=>0}, :index=>5, :weight=>1273}, > >>> {:doc=>598474, :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.52936661243439, "offer_ids"=>[], "@groupby"=>20040416, > >>> "restaurant_ids"=>598474, "source_ids"=>[], "reviews_avg_score"=>4.0, > >>> "total_checkins"=>925, "total_likes"=>5, "@count"=>1, > >>> "address_zipcode"=>60611, "has_delivery"=>1, "city_ids"=>6335, > >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[217, 287], > >>> "bh_id"=>598474, "score"=>0.0, "price_range"=>71, "name_sort"=>10, > >>> "lat"=>0.731164395809174, > >>> "has_reservation"=>0}, :index=>6, :weight=>1273}, > >>> {:doc=>598689, :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29161155223846, "offer_ids"=>[], "@groupby"=>20040110, > >>> "restaurant_ids"=>598689, "source_ids"=>[], "reviews_avg_score"=>2.0, > >>> "total_checkins"=>2, "total_likes"=>0, "@count"=>2, > >>> "address_zipcode"=>10007, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[2, 17, 22, 65], > >>> "bh_id"=>598689, "score"=>0.0, "price_range"=>69, "name_sort"=>15, > >>> "lat"=>0.71059387922287, > >>> "has_reservation"=>0}, :index=>7, :weight=>1273}, > >>> {:doc=>598000, :attributes=>{"offer_type_ids"=>[], > >>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101, > >>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0, > >>> "total_checkins"=>0, "total_likes"=>0, "@count"=>48, > >>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819, > >>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56, > >>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69, > >>> "name_sort"=>68, "lat"=>0.711285769939423, > >>> "has_reservation"=>0}, :index=>8, :weight=>1273}], :total=>9} > >>> > >>> > >>> > >>> You can easily note that grouping by :has_menu and :total_likes > >>> returns only one result (:total_found=>1). It is incorrect: there are > >>> records with :has_menu == false, total_likes = 1, total_likes =2 etc. > >>> Only group by reviews_avg_score returns correct results > >>> > >>> > >>> Example of xml data source: > >>> <?xml version="1.0" encoding="utf-8"?> > >>> <sphinx:docset> > >>> <sphinx:schema> > >>> <sphinx:field name="classnamecrc32"/> > >>> <sphinx:field name="name"/> > >>> <sphinx:field name="description"/> > >>> <sphinx:field name="offer_text"/> > >>> <sphinx:field name="offer_type_value"/> > >>> <sphinx:field name="cuisine_name"/> > >>> <sphinx:attr name="address_zipcode" type="int"/> > >>> <sphinx:attr name="restaurant_ids" type="int"/> > >>> <sphinx:attr name="lat" type="float"/> > >>> <sphinx:attr name="has_delivery" type="bool"/> > >>> <sphinx:attr name="source_ids" type="multi"/> > >>> <sphinx:attr name="lon" type="float"/> > >>> <sphinx:attr name="has_reservation" type="bool"/> > >>> <sphinx:attr name="offer_type_ids" type="multi"/> > >>> <sphinx:attr name="price_range" type="str2ordinal"/> > >>> <sphinx:attr name="has_menu" type="bool"/> > >>> <sphinx:attr name="score" type="float"/> > >>> <sphinx:attr name="neighborhood_ids" type="multi"/> > >>> <sphinx:attr name="cuisine_ids" type="multi"/> > >>> <sphinx:attr name="total_checkins" type="int"/> > >>> <sphinx:attr name="offer_ids" type="multi"/> > >>> <sphinx:attr name="reviews_avg_score" type="float"/> > >>> <sphinx:attr name="city_ids" type="int"/> > >>> <sphinx:attr name="name_sort" type="str2ordinal"/> > >>> <sphinx:attr name="total_likes" type="int"/> > >>> <sphinx:attr name="bh_id" type="int"/> > >>> </sphinx:schema> > >>> <sphinx:document id="599105"> > >>> <classnamecrc32>400456007</classnamecrc32> > >>> <name><![CDATA[Subway]]></name> > >>> <description><![CDATA[test]]></description> > >>> <offer_text><![CDATA[]]></offer_text> > >>> <offer_type_value><![CDATA[]]></offer_type_value> > >>> <cuisine_name><![CDATA[]]></cuisine_name> > >>> <address_zipcode>60622</address_zipcode> > >>> <restaurant_ids>599105</restaurant_ids> > >>> <lat>0.731224661851994</lat> > >>> <has_delivery>1</has_delivery> > >>> <source_ids></source_ids> > >>> <lon>-1.53024979754365</lon> > >>> <has_reservation>0</has_reservation> > >>> <offer_type_ids></offer_type_ids> > >>> <price_range>1</price_range> > >>> <has_menu>1</has_menu> > >>> <score>0.0</score> > >>> <neighborhood_ids>201,202,284</neighborhood_ids> > >>> <cuisine_ids></cuisine_ids> > >>> <total_checkins>5</total_checkins> > >>> <offer_ids></offer_ids> > >>> <reviews_avg_score>0</reviews_avg_score> > >>> <city_ids>6335</city_ids> > >>> <name_sort>Subway</name_sort> > >>> <total_likes>0</total_likes> > >>> <bh_id>599105</bh_id> > >>> </sphinx:document> > >>> </sphinx:docset> > >>> > >>> > >>> Thanks, Slava > >>> > >>> -- > >>> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >>> To post to this group, send email to [email protected]. > >>> To unsubscribe from this group, send email to > [email protected]. > >>> For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > >>> > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > [email protected]. > >> For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > >> > >> > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > [email protected]. > >> For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > >> <mongoid-sphinx.zip> > > > > -- > > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > [email protected]. > > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > > > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
