Hmm, if you run the sql_query_range query yourself, what does it return?
--
Pat
On 15/12/2009, at 11:24 AM, Steve H wrote:
> Hi Pat,
>
> Here is the sql_query:
>
> sql_query = SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` ,
> CAST(`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP_CONCAT
> (DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS CHAR)
> AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`,
> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS
> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS
> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`,
> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL
> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS
> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL
> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS
> `lat`, `businesses`.`lon` AS `lon` FROM `businesses` LEFT OUTER
> JOIN `categorizations` ON (`businesses`.`id` =
> `categorizations`.`business_id`) LEFT OUTER JOIN `cs_categories` ON
> (`cs_categories`.`id` = `categorizations`.`cs_category_id`) LEFT
> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id WHERE
> `businesses`.`id` >= $start AND `businesses`.`id` <= $end GROUP BY
> `businesses`.`id` ORDER BY NULL
> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
> FROM `businesses`
>
> We turned on logging for every query in mysqld to troubleshoot this.
> This is the last query run:
>
> SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , CAST
> (`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP
> _CONCAT(DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS
> CHAR) AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`,
> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS
> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS
> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`,
> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL
> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS
> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL
> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS
> `lat`, `businesses`.`lon` AS `lon` FROM `businesses` LEFT OUTER
> JOIN `categorizations` ON (`businesses`.`id` =
> `categorizations`.`business_id`) LEFT OUTER JOIN `cs_categories` ON
> (`cs_categories`.`id` = `categorizations`.`cs_category_id`) LEFT
> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id WHERE
> `businesses`.`id` >= 2365000001 AND `businesses`.`id` <= 2365744349
> GROUP BY `businesses`.`id` ORDER BY NULL
>
> Which returns zero rows. In fact, the last 15+ queries return zero
> rows. The businesses table has no records with ids
> 2034010806...2500480805. And of course, the maximum id is
> 15250646237. We need *all* the rows indexed, of course... :)
>
> So should we be increasing the step_size to like 10 million?
>
> -Steve
>
> On Dec 14, 2:58 pm, Pat Allan <[email protected]> wrote:
>> Hi Steve
>>
>> Thinking Sphinx uses left outer joins (because that's what ActiveRecord
>> uses, and it's the best fit in this situation), and yes, one doc is one row.
>>
>> There's nothing odd in your index definition - so let's have a look at the
>> sql_query value of the source business_core_0, in
>> config/development.sphinx.conf (or production - they should be the same).
>> You should be able to run it manually, see if it returns the right results
>> (perhaps over a particular step, to keep things reasonably fast and
>> manageable).
>>
>> --
>> Pat
>>
>> On 15/12/2009, at 3:26 AM, Steve H wrote:
>>
>>
>>
>>> Thanks for the lightning-fast reply Pat. :)
>>
>>> define_index do
>>> indexes :name, :sortable => true
>>> indexes categories_and_tags(:name), :as
>>> => :category_names, :sortable => true
>>> indexes [city.city, city.state_code], :as => :city, :sortable =>
>>> true
>>> set_property :group_concat_max_len => 8192
>>> # attributes
>>> has lat, lon
>>> end
>>
>>> Every business has at least one category or tag, but I'm kind of
>>> assuming here that the indexer is doing a full left join (ie, not an
>>> inner join).
>>
>>> Could something in this block be causing the problem? And is 1 doc =
>>> 1 row in the db?
>>
>>> -Steve
>>
>>> On Dec 13, 7:03 pm, Pat Allan <[email protected]> wrote:
>>>> Hi Steve
>>
>>>> What does your define_index block for Business look like?
>>
>>>> --
>>>> Pat
>>
>>>> On 14/12/2009, at 1:44 PM, Steve H wrote:
>>
>>>>> OK first off -- I'm not 100% certain that 1 doc = 1 record. But if it
>>>>> DOES then, sphinx definitely didn't collect enough. We have over 13
>>>>> million rows in our database, but this was the output after indexing:
>>
>>>>> -----------------------------------------------
>>>>> indexing index 'business_core'...
>>>>> collected 1199 docs, 0.7 MB
>>>>> collected 0 attr values
>>>>> sorted 0.0 Mvalues, 100.0% done
>>>>> sorted 0.1 Mhits, 100.0% done
>>>>> total 1199 docs, 685923 bytes
>>>>> total 1381.345 sec, 496.56 bytes/sec, 0.87 docs/sec
>>>>> distributed index 'business' can not be directly indexed; skipping.
>>>>> Started successfully (pid 32011).
>>>>> -----------------------------------------------
>>
>>>>> After reading some of the docs, I thought... maybe I have to increase
>>>>> the sql_range_step in the sphinx.yml or increase the
>>>>> group_concat_max_len property. So I did those things, but the output
>>>>> after indexing is the same (albeit much much faster :)
>>
>>>>> -----------------------------------------------
>>>>> Stopped search daemon (pid 2810).
>>>>> Generating Configuration to /home/steve/insiderpages/config/
>>>>> sphinx_dev.sphinx.conf
>>>>> Sphinx 0.9.8.1-release (r1533)
>>>>> Copyright (c) 2001-2008, Andrew Aksyonoff
>>
>>>>> using config file '/home/steve/insiderpages/config/
>>>>> sphinx_dev.sphinx.conf'...
>>>>> indexing index 'business_core'...
>>>>> collected 1199 docs, 0.1 MB
>>>>> collected 0 attr values
>>>>> sorted 0.0 Mvalues, 100.0% done
>>>>> sorted 0.0 Mhits, 100.0% done
>>>>> total 1199 docs, 79605 bytes
>>>>> total 0.447 sec, 177901.38 bytes/sec, 2679.53 docs/sec
>>>>> distributed index 'business' can not be directly indexed; skipping.
>>>>> Started successfully (pid 6365).
>>>>> -----------------------------------------------
>>
>>>>> What's the problem?
>>
>>>>> -Steve
>>
>>>>> PS:
>>>>> mysql> select count(id) from businesses;
>>>>> +-----------+
>>>>> | count(id) |
>>>>> +-----------+
>>>>> | 13395683 |
>>>>> +-----------+
>>>>> 1 row in set (0.00 sec)
>>
>>>>> mysql> select max(id) from businesses;
>>>>> +-------------+
>>>>> | max(id) |
>>>>> +-------------+
>>>>> | 15250646126 |
>>>>> +-------------+
>>>>> 1 row in set (0.00 sec)
>>
>>>>> --
>>
>>>>> You received this message because you are subscribed to the Google Groups
>>>>> "Thinking Sphinx" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected].
>>>>> For more options, visit this group
>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>>
>>> --
>>
>>> You received this message because you are subscribed to the Google Groups
>>> "Thinking Sphinx" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group
>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
>
>
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/thinking-sphinx?hl=en.