Yeah, that's my thought too - was wondering if it was MySQL at fault, or Sphinx.
Sphinx 0.9.9 (and maybe 0.9.8) has a --enable-id64 flag when you configure - I think it's worth recompiling and trying indexing again. -- Pat On 15/12/2009, at 11:40 AM, Steve H wrote: > I found one thing very strange... no matter what the sql_range_step is > set to (default - 10,000,000), the last query always has an upper > limit of 2365744349. My very astute manager noticed that this is kind > of in the neighborhood of the largest possible value for an int. > Maybe that has something to do with it? > > -Steve > > On Dec 14, 4:35 pm, Steve H <[email protected]> wrote: >> mysql> SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) >> -> FROM `businesses`; >> +----------------------+----------------------+ >> | IFNULL(MIN(`id`), 1) | IFNULL(MAX(`id`), 1) | >> +----------------------+----------------------+ >> | 1 | 15250646237 | >> +----------------------+----------------------+ >> >> On Dec 14, 4:32 pm, Pat Allan <[email protected]> wrote: >> >> >> >>> Hmm, if you run the sql_query_range query yourself, what does it return? >> >>> -- >>> Pat >> >>> On 15/12/2009, at 11:24 AM, Steve H wrote: >> >>>> Hi Pat, >> >>>> Here is the sql_query: >> >>>> sql_query = SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , >>>> CAST(`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP_CONCAT >>>> (DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS CHAR) >>>> AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`, >>>> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS >>>> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS >>>> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`, >>>> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL >>>> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS >>>> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL >>>> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS >>>> `lat`, `businesses`.`lon` AS `lon` FROM `businesses` LEFT OUTER >>>> JOIN `categorizations` ON (`businesses`.`id` = >>>> `categorizations`.`business_id`) LEFT OUTER JOIN `cs_categories` ON >>>> (`cs_categories`.`id` = `categorizations`.`cs_category_id`) LEFT >>>> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id WHERE >>>> `businesses`.`id` >= $start AND `businesses`.`id` <= $end GROUP BY >>>> `businesses`.`id` ORDER BY NULL >>>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) >>>> FROM `businesses` >> >>>> We turned on logging for every query in mysqld to troubleshoot this. >>>> This is the last query run: >> >>>> SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , CAST >>>> (`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP >>>> _CONCAT(DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS >>>> CHAR) AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`, >>>> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS >>>> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS >>>> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`, >>>> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL >>>> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS >>>> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL >>>> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS >>>> `lat`, `businesses`.`lon` AS `lon` FROM `businesses` LEFT OUTER >>>> JOIN `categorizations` ON (`businesses`.`id` = >>>> `categorizations`.`business_id`) LEFT OUTER JOIN `cs_categories` ON >>>> (`cs_categories`.`id` = `categorizations`.`cs_category_id`) LEFT >>>> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id WHERE >>>> `businesses`.`id` >= 2365000001 AND `businesses`.`id` <= 2365744349 >>>> GROUP BY `businesses`.`id` ORDER BY NULL >> >>>> Which returns zero rows. In fact, the last 15+ queries return zero >>>> rows. The businesses table has no records with ids >>>> 2034010806...2500480805. And of course, the maximum id is >>>> 15250646237. We need *all* the rows indexed, of course... :) >> >>>> So should we be increasing the step_size to like 10 million? >> >>>> -Steve >> >>>> On Dec 14, 2:58 pm, Pat Allan <[email protected]> wrote: >>>>> Hi Steve >> >>>>> Thinking Sphinx uses left outer joins (because that's what ActiveRecord >>>>> uses, and it's the best fit in this situation), and yes, one doc is one >>>>> row. >> >>>>> There's nothing odd in your index definition - so let's have a look at >>>>> the sql_query value of the source business_core_0, in >>>>> config/development.sphinx.conf (or production - they should be the same). >>>>> You should be able to run it manually, see if it returns the right >>>>> results (perhaps over a particular step, to keep things reasonably fast >>>>> and manageable). >> >>>>> -- >>>>> Pat >> >>>>> On 15/12/2009, at 3:26 AM, Steve H wrote: >> >>>>>> Thanks for the lightning-fast reply Pat. :) >> >>>>>> define_index do >>>>>> indexes :name, :sortable => true >>>>>> indexes categories_and_tags(:name), :as >>>>>> => :category_names, :sortable => true >>>>>> indexes [city.city, city.state_code], :as => :city, :sortable => >>>>>> true >>>>>> set_property :group_concat_max_len => 8192 >>>>>> # attributes >>>>>> has lat, lon >>>>>> end >> >>>>>> Every business has at least one category or tag, but I'm kind of >>>>>> assuming here that the indexer is doing a full left join (ie, not an >>>>>> inner join). >> >>>>>> Could something in this block be causing the problem? And is 1 doc = >>>>>> 1 row in the db? >> >>>>>> -Steve >> >>>>>> On Dec 13, 7:03 pm, Pat Allan <[email protected]> wrote: >>>>>>> Hi Steve >> >>>>>>> What does your define_index block for Business look like? >> >>>>>>> -- >>>>>>> Pat >> >>>>>>> On 14/12/2009, at 1:44 PM, Steve H wrote: >> >>>>>>>> OK first off -- I'm not 100% certain that 1 doc = 1 record. But if it >>>>>>>> DOES then, sphinx definitely didn't collect enough. We have over 13 >>>>>>>> million rows in our database, but this was the output after indexing: >> >>>>>>>> ----------------------------------------------- >>>>>>>> indexing index 'business_core'... >>>>>>>> collected 1199 docs, 0.7 MB >>>>>>>> collected 0 attr values >>>>>>>> sorted 0.0 Mvalues, 100.0% done >>>>>>>> sorted 0.1 Mhits, 100.0% done >>>>>>>> total 1199 docs, 685923 bytes >>>>>>>> total 1381.345 sec, 496.56 bytes/sec, 0.87 docs/sec >>>>>>>> distributed index 'business' can not be directly indexed; skipping. >>>>>>>> Started successfully (pid 32011). >>>>>>>> ----------------------------------------------- >> >>>>>>>> After reading some of the docs, I thought... maybe I have to increase >>>>>>>> the sql_range_step in the sphinx.yml or increase the >>>>>>>> group_concat_max_len property. So I did those things, but the output >>>>>>>> after indexing is the same (albeit much much faster :) >> >>>>>>>> ----------------------------------------------- >>>>>>>> Stopped search daemon (pid 2810). >>>>>>>> Generating Configuration to /home/steve/insiderpages/config/ >>>>>>>> sphinx_dev.sphinx.conf >>>>>>>> Sphinx 0.9.8.1-release (r1533) >>>>>>>> Copyright (c) 2001-2008, Andrew Aksyonoff >> >>>>>>>> using config file '/home/steve/insiderpages/config/ >>>>>>>> sphinx_dev.sphinx.conf'... >>>>>>>> indexing index 'business_core'... >>>>>>>> collected 1199 docs, 0.1 MB >>>>>>>> collected 0 attr values >>>>>>>> sorted 0.0 Mvalues, 100.0% done >>>>>>>> sorted 0.0 Mhits, 100.0% done >>>>>>>> total 1199 docs, 79605 bytes >>>>>>>> total 0.447 sec, 177901.38 bytes/sec, 2679.53 docs/sec >>>>>>>> distributed index 'business' can not be directly indexed; skipping. >>>>>>>> Started successfully (pid 6365). >>>>>>>> ----------------------------------------------- >> >>>>>>>> What's the problem? >> >>>>>>>> -Steve >> >>>>>>>> PS: >>>>>>>> mysql> select count(id) from businesses; >>>>>>>> +-----------+ >>>>>>>> | count(id) | >>>>>>>> +-----------+ >>>>>>>> | 13395683 | >>>>>>>> +-----------+ >>>>>>>> 1 row in set (0.00 sec) >> >>>>>>>> mysql> select max(id) from businesses; >>>>>>>> +-------------+ >>>>>>>> | max(id) | >>>>>>>> +-------------+ >>>>>>>> | 15250646126 | >>>>>>>> +-------------+ >>>>>>>> 1 row in set (0.00 sec) >> >>>>>>>> -- >> >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "Thinking Sphinx" group. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> To unsubscribe from this group, send email to >>>>>>>> [email protected]. >>>>>>>> For more options, visit this group >>>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. >> >>>>>> -- >> >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Thinking Sphinx" group. >>>>>> To post to this group, send email to [email protected]. >>>>>> To unsubscribe from this group, send email to >>>>>> [email protected]. >>>>>> For more options, visit this group >>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. >> >>>> -- >> >>>> You received this message because you are subscribed to the Google Groups >>>> "Thinking Sphinx" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]. >>>> For more options, visit this group >>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. > > -- > > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
