Yeah, that's my thought too - was wondering if it was MySQL at fault, or Sphinx.

Sphinx 0.9.9 (and maybe 0.9.8) has a --enable-id64 flag when you configure - I 
think it's worth recompiling and trying indexing again.

-- 
Pat

On 15/12/2009, at 11:40 AM, Steve H wrote:

> I found one thing very strange... no matter what the sql_range_step is
> set to (default - 10,000,000), the last query always has an upper
> limit of 2365744349.  My very astute manager noticed that this is kind
> of in the neighborhood of the largest possible value for an int.
> Maybe that has something to do with it?
> 
> -Steve
> 
> On Dec 14, 4:35 pm, Steve H <[email protected]> wrote:
>> mysql> SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
>>     -> FROM `businesses`;
>> +----------------------+----------------------+
>> | IFNULL(MIN(`id`), 1) | IFNULL(MAX(`id`), 1) |
>> +----------------------+----------------------+
>> |                    1 |          15250646237 |
>> +----------------------+----------------------+
>> 
>> On Dec 14, 4:32 pm, Pat Allan <[email protected]> wrote:
>> 
>> 
>> 
>>> Hmm, if you run the sql_query_range query yourself, what does it return?
>> 
>>> --
>>> Pat
>> 
>>> On 15/12/2009, at 11:24 AM, Steve H wrote:
>> 
>>>> Hi Pat,
>> 
>>>> Here is the sql_query:
>> 
>>>> sql_query = SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` ,
>>>> CAST(`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP_CONCAT
>>>> (DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS CHAR)
>>>> AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`,
>>>> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS
>>>> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS
>>>> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`,
>>>> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL
>>>> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS
>>>> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL
>>>> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS
>>>> `lat`, `businesses`.`lon` AS `lon` FROM `businesses`    LEFT OUTER
>>>> JOIN `categorizations` ON (`businesses`.`id` =
>>>> `categorizations`.`business_id`)  LEFT OUTER JOIN `cs_categories` ON
>>>> (`cs_categories`.`id` = `categorizations`.`cs_category_id`)   LEFT
>>>> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id  WHERE
>>>> `businesses`.`id` >= $start AND `businesses`.`id` <= $end GROUP BY
>>>> `businesses`.`id`  ORDER BY NULL
>>>>  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
>>>> FROM `businesses`
>> 
>>>> We turned on logging for every query in mysqld to troubleshoot this.
>>>> This is the last query run:
>> 
>>>> SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , CAST
>>>> (`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP
>>>> _CONCAT(DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS
>>>> CHAR) AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`,
>>>> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS
>>>> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS
>>>> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`,
>>>> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL
>>>> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS
>>>> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL
>>>> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS
>>>> `lat`, `businesses`.`lon` AS `lon` FROM `businesses`    LEFT OUTER
>>>> JOIN `categorizations` ON (`businesses`.`id` =
>>>> `categorizations`.`business_id`)  LEFT OUTER JOIN `cs_categories` ON
>>>> (`cs_categories`.`id` = `categorizations`.`cs_category_id`)   LEFT
>>>> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id  WHERE
>>>> `businesses`.`id` >= 2365000001 AND `businesses`.`id` <= 2365744349
>>>> GROUP BY `businesses`.`id`  ORDER BY NULL
>> 
>>>> Which returns zero rows.  In fact, the last 15+ queries return zero
>>>> rows.  The businesses table has no records with ids
>>>> 2034010806...2500480805.  And of course, the maximum id is
>>>> 15250646237.  We need *all* the rows indexed, of course...  :)
>> 
>>>> So should we be increasing the step_size to like 10 million?
>> 
>>>> -Steve
>> 
>>>> On Dec 14, 2:58 pm, Pat Allan <[email protected]> wrote:
>>>>> Hi Steve
>> 
>>>>> Thinking Sphinx uses left outer joins (because that's what ActiveRecord 
>>>>> uses, and it's the best fit in this situation), and yes, one doc is one 
>>>>> row.
>> 
>>>>> There's nothing odd in your index definition - so let's have a look at 
>>>>> the sql_query value of the source business_core_0, in 
>>>>> config/development.sphinx.conf (or production - they should be the same). 
>>>>> You should be able to run it manually, see if it returns the right 
>>>>> results (perhaps over a particular step, to keep things reasonably fast 
>>>>> and manageable).
>> 
>>>>> --
>>>>> Pat
>> 
>>>>> On 15/12/2009, at 3:26 AM, Steve H wrote:
>> 
>>>>>> Thanks for the lightning-fast reply Pat.  :)
>> 
>>>>>>  define_index do
>>>>>>    indexes :name, :sortable => true
>>>>>>    indexes categories_and_tags(:name), :as
>>>>>> => :category_names, :sortable => true
>>>>>>    indexes [city.city, city.state_code], :as => :city, :sortable =>
>>>>>> true
>>>>>>    set_property :group_concat_max_len => 8192
>>>>>>    # attributes
>>>>>>    has lat, lon
>>>>>>  end
>> 
>>>>>> Every business has at least one category or tag, but I'm kind of
>>>>>> assuming here that the indexer is doing a full left join (ie, not an
>>>>>> inner join).
>> 
>>>>>> Could something in this block be causing the problem?  And is 1 doc =
>>>>>> 1 row in the db?
>> 
>>>>>> -Steve
>> 
>>>>>> On Dec 13, 7:03 pm, Pat Allan <[email protected]> wrote:
>>>>>>> Hi Steve
>> 
>>>>>>> What does your define_index block for Business look like?
>> 
>>>>>>> --
>>>>>>> Pat
>> 
>>>>>>> On 14/12/2009, at 1:44 PM, Steve H wrote:
>> 
>>>>>>>> OK first off -- I'm not 100% certain that 1 doc = 1 record.  But if it
>>>>>>>> DOES then, sphinx definitely didn't collect enough.  We have over 13
>>>>>>>> million rows in our database, but this was the output after indexing:
>> 
>>>>>>>> -----------------------------------------------
>>>>>>>> indexing index 'business_core'...
>>>>>>>> collected 1199 docs, 0.7 MB
>>>>>>>> collected 0 attr values
>>>>>>>> sorted 0.0 Mvalues, 100.0% done
>>>>>>>> sorted 0.1 Mhits, 100.0% done
>>>>>>>> total 1199 docs, 685923 bytes
>>>>>>>> total 1381.345 sec, 496.56 bytes/sec, 0.87 docs/sec
>>>>>>>> distributed index 'business' can not be directly indexed; skipping.
>>>>>>>> Started successfully (pid 32011).
>>>>>>>> -----------------------------------------------
>> 
>>>>>>>> After reading some of the docs, I thought... maybe I have to increase
>>>>>>>> the sql_range_step in the sphinx.yml or increase the
>>>>>>>> group_concat_max_len property.  So I did those things, but the output
>>>>>>>> after indexing is the same (albeit much much faster  :)
>> 
>>>>>>>> -----------------------------------------------
>>>>>>>> Stopped search daemon (pid 2810).
>>>>>>>> Generating Configuration to /home/steve/insiderpages/config/
>>>>>>>> sphinx_dev.sphinx.conf
>>>>>>>> Sphinx 0.9.8.1-release (r1533)
>>>>>>>> Copyright (c) 2001-2008, Andrew Aksyonoff
>> 
>>>>>>>> using config file '/home/steve/insiderpages/config/
>>>>>>>> sphinx_dev.sphinx.conf'...
>>>>>>>> indexing index 'business_core'...
>>>>>>>> collected 1199 docs, 0.1 MB
>>>>>>>> collected 0 attr values
>>>>>>>> sorted 0.0 Mvalues, 100.0% done
>>>>>>>> sorted 0.0 Mhits, 100.0% done
>>>>>>>> total 1199 docs, 79605 bytes
>>>>>>>> total 0.447 sec, 177901.38 bytes/sec, 2679.53 docs/sec
>>>>>>>> distributed index 'business' can not be directly indexed; skipping.
>>>>>>>> Started successfully (pid 6365).
>>>>>>>> -----------------------------------------------
>> 
>>>>>>>> What's the problem?
>> 
>>>>>>>> -Steve
>> 
>>>>>>>> PS:
>>>>>>>> mysql> select count(id) from businesses;
>>>>>>>> +-----------+
>>>>>>>> | count(id) |
>>>>>>>> +-----------+
>>>>>>>> |  13395683 |
>>>>>>>> +-----------+
>>>>>>>> 1 row in set (0.00 sec)
>> 
>>>>>>>> mysql> select max(id) from businesses;
>>>>>>>> +-------------+
>>>>>>>> | max(id)     |
>>>>>>>> +-------------+
>>>>>>>> | 15250646126 |
>>>>>>>> +-------------+
>>>>>>>> 1 row in set (0.00 sec)
>> 
>>>>>>>> --
>> 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Thinking Sphinx" group.
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>> To unsubscribe from this group, send email to 
>>>>>>>> [email protected].
>>>>>>>> For more options, visit this group 
>>>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>> 
>>>>>> --
>> 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Thinking Sphinx" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to 
>>>>>> [email protected].
>>>>>> For more options, visit this group 
>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>> 
>>>> --
>> 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Thinking Sphinx" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to 
>>>> [email protected].
>>>> For more options, visit this group 
>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
> 
> --
> 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 
> 

--

You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.


Reply via email to