And I should look at your code first before stating that - as you're doing that 
too. Damn, thought I had it figured out...

Let me ponder some more.

-- 
Pat

On 23/03/2011, at 11:06 AM, Pat Allan wrote:

> Ah, I think I know what the problem is... if you look at the facet code in 
> TS, you'll find both limit and max_matches are being set to either 1000 
> (Sphinx's default) or the custom value (sometimes larger) so Sphinx looks at 
> the widest possible set of results to figure out the values.
> 
> https://github.com/freelancing-god/thinking-sphinx/blob/master/lib/thinking_sphinx/facet_search.rb#L59-60
> 
> Try doing the same for your facet search - as it's possible Sphinx just isn't 
> getting deep enough into the result set to find other combinations.
> 
> -- 
> Pat
> 
> On 23/03/2011, at 10:55 AM, Viacheslav Dushin wrote:
> 
>> Hi, Pat
>> 
>> This indexer is very simplified version of thinking sphinx.
>> 
>> facets method is in:
>> 
>> mongoid-sphinx/lib/mongoid_sphinx/mongoid/sphinx.rb
>> 
>>      def facets(*args)
>>        options = args.extract_options!
>>        query = args.join(" ")
>>        MongoidSphinx::FacetSearch.new(query,self, options).facets
>>      end
>> 
>> Facet search creates array of bundle searches (facet_search.rb)
>> Each search in this array is grouped_by different facet attribute
>>    def search
>>      return if class_name.facet_attributes.blank?
>>      bundled_search = BundledSearch.new
>>      class_name.facet_attributes.each do |attribute|
>>        bundled_search.search(query, class_name, 
>> facet_search_options(attribute))
>>      end
>>      bundled_search
>>    end
>> 
>> after that all results are mapped in facet has
>> 
>>    def facets
>>      self.search.results
>>       res = {}
>>       self.search.results.each_with_index do |result, index|
>>         attr_name = class_name.facet_attributes[index].to_s
>>         res[attr_name] = result[:matches].map{|o| 
>> [o[:attributes][attr_name], o[:attributes]["@count"]]}.to_hash
>>       end
>>       res
>>    end
>> 
>> 
>> This code is based on Thinking Sphinx
>> 
>> see full code in attachment
>> 
>> 
>> Also I noted that group_by in thinking sphinx returns incorrect results too:
>> Restaurant.search("pizza", :group_by=> :has_menu) -- returns only one result.
>> debugging showed that:
>> result[:matches].length == 1
>> but when I run 
>> Restaurant.facets("pizza") 
>> debugging shows that
>> results[0][:has_menu].length == 2
>> this is correct
>> 
>> As far as I understand, thinking sphinx uses group_by parameter, to calc 
>> facets. But why these results are different?
>> 
>> 
>> Thanks
>> 
>> 2011/3/23 Pat Allan <[email protected]>
>> Hi Viacheslav
>> 
>> Can you run me through the code you're using to make these queries? It does 
>> seem like something's wrong, but I need a bit more context.
>> 
>> --
>> Pat
>> 
>> On 23/03/2011, at 2:38 AM, Viacheslav Dushin wrote:
>> 
>>> Hello,
>>> 
>>> I'm using latest version of Riddle from Github, Sphinx 0.9.9-release
>>> (r2117) and xmlpipe2 as datasource for Sphinx.
>>> I use group by to implement facets (similar to thinking sphinx, but
>>> for xmlpipe2 datasource)
>>> There is a problem: group by works incorrectly for "int", "bool" and
>>> "multi" attributes, but it works ok float attributes.
>>> Here is an example of output:
>>> 
>>> grouping by has_menu -- bool:
>>> 
>>>>> MongoRestaurant.facets("")[0]
>>> => {:status=>0, :total_found=>1, :attribute_names=>["total_likes",
>>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids",
>>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort",
>>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids",
>>> "price_range", "score", "city_ids", "total_checkins",
>>> "reviews_avg_score", "bh_id", "@groupby",
>>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5,
>>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1,
>>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1,
>>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1,
>>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4,
>>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1,
>>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5,
>>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}}, 
>>> :time=>0.0, :fields=>["classnamecrc32",
>>> "name", "description", "offer_text", "offer_type_value",
>>> "cuisine_name"], :matches=>[{:doc=>598000, 
>>> :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101,
>>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0,
>>> "total_checkins"=>0, "total_likes"=>0, "@count"=>70,
>>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56,
>>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69,
>>> "name_sort"=>68, "lat"=>0.711285769939423,
>>> "has_reservation"=>0}, :index=>0, :weight=>1273}], :total=>1}
>>> 
>>> grouping by total_likes -- integer:
>>> 
>>> => {:status=>0, :total_found=>1, :attribute_names=>["total_likes",
>>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids",
>>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort",
>>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids",
>>> "price_range", "score", "city_ids", "total_checkins",
>>> "reviews_avg_score", "bh_id", "@groupby",
>>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5,
>>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1,
>>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1,
>>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1,
>>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4,
>>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1,
>>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5,
>>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}}, 
>>> :time=>0.001, :fields=>["classnamecrc32",
>>> "name", "description", "offer_text", "offer_type_value",
>>> "cuisine_name"], :matches=>[{:doc=>598000, 
>>> :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101,
>>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0,
>>> "total_checkins"=>0, "total_likes"=>0, "@count"=>70,
>>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56,
>>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69,
>>> "name_sort"=>68, "lat"=>0.711285769939423,
>>> "has_reservation"=>0}, :index=>0, :weight=>1273}], :total=>1}
>>> 
>>> 
>>> grouping by reviews_avg_score -- float
>>> 
>>> => {:status=>0, :total_found=>9, :attribute_names=>["total_likes",
>>> "neighborhood_ids", "lon", "has_delivery", "offer_type_ids",
>>> "has_reservation", "cuisine_ids", "lat", "source_ids", "name_sort",
>>> "address_zipcode", "has_menu", "offer_ids", "restaurant_ids",
>>> "price_range", "score", "city_ids", "total_checkins",
>>> "reviews_avg_score", "bh_id", "@groupby",
>>> "@count"], :attributes=>{"offer_type_ids"=>1073741825, "lon"=>5,
>>> "offer_ids"=>1073741825, "@groupby"=>1, "restaurant_ids"=>1,
>>> "source_ids"=>1073741825, "reviews_avg_score"=>5, "total_checkins"=>1,
>>> "total_likes"=>1, "@count"=>1, "address_zipcode"=>1,
>>> "has_delivery"=>4, "city_ids"=>1, "has_menu"=>4,
>>> "cuisine_ids"=>1073741825, "neighborhood_ids"=>1073741825, "bh_id"=>1,
>>> "score"=>5, "price_range"=>3, "name_sort"=>3, "lat"=>5,
>>> "has_reservation"=>4}, :words=>{"400456007"=>{:docs=>70, :hits=>70}}, 
>>> :time=>0.001, :fields=>["classnamecrc32",
>>> "name", "description", "offer_text", "offer_type_value",
>>> "cuisine_name"], :matches=>[{:doc=>598261, 
>>> :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29073655605316, "offer_ids"=>[], "@groupby"=>20040816,
>>> "restaurant_ids"=>598261, "source_ids"=>[], "reviews_avg_score"=>10.0,
>>> "total_checkins"=>0, "total_likes"=>0, "@count"=>5,
>>> "address_zipcode"=>10028, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>1, "cuisine_ids"=>[21], "neighborhood_ids"=>[8, 22, 23,
>>> 57], "bh_id"=>598261, "score"=>0.0, "price_range"=>69,
>>> "name_sort"=>64, "lat"=>0.711645185947418,
>>> "has_reservation"=>0}, :index=>0, :weight=>1273},
>>> {:doc=>598904, :attributes=>{"offer_type_ids"=>[11],
>>> "lon"=>-1.29076039791107, "offer_ids"=>[622122], "@groupby"=>20040804,
>>> "restaurant_ids"=>598904, "source_ids"=>[15],
>>> "reviews_avg_score"=>9.0, "total_checkins"=>13, "total_likes"=>1,
>>> "@count"=>3, "address_zipcode"=>10021, "has_delivery"=>1,
>>> "city_ids"=>18819, "has_menu"=>0, "cuisine_ids"=>[],
>>> "neighborhood_ids"=>[22, 23, 28, 57], "bh_id"=>598904,
>>> "score"=>2.4300000667572, "price_range"=>69, "name_sort"=>26,
>>> "lat"=>0.711563467979431,
>>> "has_reservation"=>0}, :index=>1, :weight=>1273},
>>> {:doc=>598488, :attributes=>{"offer_type_ids"=>[], "lon"=>0.0,
>>> "offer_ids"=>[], "@groupby"=>20040722, "restaurant_ids"=>598488,
>>> "source_ids"=>[], "reviews_avg_score"=>8.0, "total_checkins"=>0,
>>> "total_likes"=>0, "@count"=>3, "address_zipcode"=>10012,
>>> "has_delivery"=>1, "city_ids"=>18819, "has_menu"=>1,
>>> "cuisine_ids"=>[33], "neighborhood_ids"=>[2, 9, 22, 61],
>>> "bh_id"=>598488, "score"=>0.0, "price_range"=>69, "name_sort"=>37,
>>> "lat"=>0.0, "has_reservation"=>0}, :index=>2, :weight=>1273},
>>> {:doc=>599149, :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29116952419281, "offer_ids"=>[], "@groupby"=>20040628,
>>> "restaurant_ids"=>599149, "source_ids"=>[], "reviews_avg_score"=>7.0,
>>> "total_checkins"=>22, "total_likes"=>0, "@count"=>2,
>>> "address_zipcode"=>10017, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>0, "cuisine_ids"=>[28], "neighborhood_ids"=>[6, 22, 60],
>>> "bh_id"=>599149, "score"=>0.0, "price_range"=>69, "name_sort"=>45,
>>> "lat"=>0.711319506168365,
>>> "has_reservation"=>0}, :index=>3, :weight=>1273},
>>> {:doc=>598304, :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.52945172786713, "offer_ids"=>[], "@groupby"=>20040604,
>>> "restaurant_ids"=>598304, "source_ids"=>[], "reviews_avg_score"=>6.0,
>>> "total_checkins"=>115, "total_likes"=>0, "@count"=>5,
>>> "address_zipcode"=>60604, "has_delivery"=>1, "city_ids"=>6335,
>>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[240],
>>> "bh_id"=>598304, "score"=>0.0, "price_range"=>69, "name_sort"=>0,
>>> "lat"=>0.73090934753418,
>>> "has_reservation"=>0}, :index=>4, :weight=>1273},
>>> {:doc=>598791, :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29123413562775, "offer_ids"=>[], "@groupby"=>20040511,
>>> "restaurant_ids"=>598791, "source_ids"=>[], "reviews_avg_score"=>5.0,
>>> "total_checkins"=>12, "total_likes"=>6, "@count"=>1,
>>> "address_zipcode"=>10018, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>1, "cuisine_ids"=>[2], "neighborhood_ids"=>[7, 22, 60],
>>> "bh_id"=>598791, "score"=>0.0, "price_range"=>70, "name_sort"=>7,
>>> "lat"=>0.711277902126312,
>>> "has_reservation"=>0}, :index=>5, :weight=>1273},
>>> {:doc=>598474, :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.52936661243439, "offer_ids"=>[], "@groupby"=>20040416,
>>> "restaurant_ids"=>598474, "source_ids"=>[], "reviews_avg_score"=>4.0,
>>> "total_checkins"=>925, "total_likes"=>5, "@count"=>1,
>>> "address_zipcode"=>60611, "has_delivery"=>1, "city_ids"=>6335,
>>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[217, 287],
>>> "bh_id"=>598474, "score"=>0.0, "price_range"=>71, "name_sort"=>10,
>>> "lat"=>0.731164395809174,
>>> "has_reservation"=>0}, :index=>6, :weight=>1273},
>>> {:doc=>598689, :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29161155223846, "offer_ids"=>[], "@groupby"=>20040110,
>>> "restaurant_ids"=>598689, "source_ids"=>[], "reviews_avg_score"=>2.0,
>>> "total_checkins"=>2, "total_likes"=>0, "@count"=>2,
>>> "address_zipcode"=>10007, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[2, 17, 22, 65],
>>> "bh_id"=>598689, "score"=>0.0, "price_range"=>69, "name_sort"=>15,
>>> "lat"=>0.71059387922287,
>>> "has_reservation"=>0}, :index=>7, :weight=>1273},
>>> {:doc=>598000, :attributes=>{"offer_type_ids"=>[],
>>> "lon"=>-1.29096734523773, "offer_ids"=>[], "@groupby"=>19700101,
>>> "restaurant_ids"=>598000, "source_ids"=>[], "reviews_avg_score"=>0.0,
>>> "total_checkins"=>0, "total_likes"=>0, "@count"=>48,
>>> "address_zipcode"=>10022, "has_delivery"=>1, "city_ids"=>18819,
>>> "has_menu"=>1, "cuisine_ids"=>[], "neighborhood_ids"=>[16, 22, 56,
>>> 60], "bh_id"=>598000, "score"=>0.0, "price_range"=>69,
>>> "name_sort"=>68, "lat"=>0.711285769939423,
>>> "has_reservation"=>0}, :index=>8, :weight=>1273}], :total=>9}
>>> 
>>> 
>>> 
>>> You can easily note that grouping by :has_menu and :total_likes
>>> returns only one result (:total_found=>1). It is incorrect: there are
>>> records with :has_menu == false, total_likes = 1, total_likes =2 etc.
>>> Only group by reviews_avg_score returns correct results
>>> 
>>> 
>>> Example of xml data source:
>>> <?xml version="1.0" encoding="utf-8"?>
>>> <sphinx:docset>
>>> <sphinx:schema>
>>> <sphinx:field name="classnamecrc32"/>
>>> <sphinx:field name="name"/>
>>> <sphinx:field name="description"/>
>>> <sphinx:field name="offer_text"/>
>>> <sphinx:field name="offer_type_value"/>
>>> <sphinx:field name="cuisine_name"/>
>>> <sphinx:attr name="address_zipcode" type="int"/>
>>> <sphinx:attr name="restaurant_ids" type="int"/>
>>> <sphinx:attr name="lat" type="float"/>
>>> <sphinx:attr name="has_delivery" type="bool"/>
>>> <sphinx:attr name="source_ids" type="multi"/>
>>> <sphinx:attr name="lon" type="float"/>
>>> <sphinx:attr name="has_reservation" type="bool"/>
>>> <sphinx:attr name="offer_type_ids" type="multi"/>
>>> <sphinx:attr name="price_range" type="str2ordinal"/>
>>> <sphinx:attr name="has_menu" type="bool"/>
>>> <sphinx:attr name="score" type="float"/>
>>> <sphinx:attr name="neighborhood_ids" type="multi"/>
>>> <sphinx:attr name="cuisine_ids" type="multi"/>
>>> <sphinx:attr name="total_checkins" type="int"/>
>>> <sphinx:attr name="offer_ids" type="multi"/>
>>> <sphinx:attr name="reviews_avg_score" type="float"/>
>>> <sphinx:attr name="city_ids" type="int"/>
>>> <sphinx:attr name="name_sort" type="str2ordinal"/>
>>> <sphinx:attr name="total_likes" type="int"/>
>>> <sphinx:attr name="bh_id" type="int"/>
>>> </sphinx:schema>
>>> <sphinx:document id="599105">
>>> <classnamecrc32>400456007</classnamecrc32>
>>> <name><![CDATA[Subway]]></name>
>>> <description><![CDATA[test]]></description>
>>> <offer_text><![CDATA[]]></offer_text>
>>> <offer_type_value><![CDATA[]]></offer_type_value>
>>> <cuisine_name><![CDATA[]]></cuisine_name>
>>> <address_zipcode>60622</address_zipcode>
>>> <restaurant_ids>599105</restaurant_ids>
>>> <lat>0.731224661851994</lat>
>>> <has_delivery>1</has_delivery>
>>> <source_ids></source_ids>
>>> <lon>-1.53024979754365</lon>
>>> <has_reservation>0</has_reservation>
>>> <offer_type_ids></offer_type_ids>
>>> <price_range>1</price_range>
>>> <has_menu>1</has_menu>
>>> <score>0.0</score>
>>> <neighborhood_ids>201,202,284</neighborhood_ids>
>>> <cuisine_ids></cuisine_ids>
>>> <total_checkins>5</total_checkins>
>>> <offer_ids></offer_ids>
>>> <reviews_avg_score>0</reviews_avg_score>
>>> <city_ids>6335</city_ids>
>>> <name_sort>Subway</name_sort>
>>> <total_likes>0</total_likes>
>>> <bh_id>599105</bh_id>
>>> </sphinx:document>
>>> </sphinx:docset>
>>> 
>>> 
>>> Thanks, Slava
>>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/thinking-sphinx?hl=en.
>>> 
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/thinking-sphinx?hl=en.
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/thinking-sphinx?hl=en.
>> <mongoid-sphinx.zip>
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to