mysql> SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
    -> FROM `businesses`;
+----------------------+----------------------+
| IFNULL(MIN(`id`), 1) | IFNULL(MAX(`id`), 1) |
+----------------------+----------------------+
|                    1 |          15250646237 |
+----------------------+----------------------+

On Dec 14, 4:32 pm, Pat Allan <[email protected]> wrote:
> Hmm, if you run the sql_query_range query yourself, what does it return?
>
> --
> Pat
>
> On 15/12/2009, at 11:24 AM, Steve H wrote:
>
>
>
> > Hi Pat,
>
> > Here is the sql_query:
>
> > sql_query = SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` ,
> > CAST(`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP_CONCAT
> > (DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS CHAR)
> > AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`,
> > `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS
> > `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS
> > `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`,
> > '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL
> > (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS
> > `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL
> > (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS
> > `lat`, `businesses`.`lon` AS `lon` FROM `businesses`    LEFT OUTER
> > JOIN `categorizations` ON (`businesses`.`id` =
> > `categorizations`.`business_id`)  LEFT OUTER JOIN `cs_categories` ON
> > (`cs_categories`.`id` = `categorizations`.`cs_category_id`)   LEFT
> > OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id  WHERE
> > `businesses`.`id` >= $start AND `businesses`.`id` <= $end GROUP BY
> > `businesses`.`id`  ORDER BY NULL
> >  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1)
> > FROM `businesses`
>
> > We turned on logging for every query in mysqld to troubleshoot this.
> > This is the last query run:
>
> > SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , CAST
> > (`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP
> > _CONCAT(DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS
> > CHAR) AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`,
> > `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS
> > `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS
> > `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`,
> > '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL
> > (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS
> > `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), IFNULL
> > (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS
> > `lat`, `businesses`.`lon` AS `lon` FROM `businesses`    LEFT OUTER
> > JOIN `categorizations` ON (`businesses`.`id` =
> > `categorizations`.`business_id`)  LEFT OUTER JOIN `cs_categories` ON
> > (`cs_categories`.`id` = `categorizations`.`cs_category_id`)   LEFT
> > OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id  WHERE
> > `businesses`.`id` >= 2365000001 AND `businesses`.`id` <= 2365744349
> > GROUP BY `businesses`.`id`  ORDER BY NULL
>
> > Which returns zero rows.  In fact, the last 15+ queries return zero
> > rows.  The businesses table has no records with ids
> > 2034010806...2500480805.  And of course, the maximum id is
> > 15250646237.  We need *all* the rows indexed, of course...  :)
>
> > So should we be increasing the step_size to like 10 million?
>
> > -Steve
>
> > On Dec 14, 2:58 pm, Pat Allan <[email protected]> wrote:
> >> Hi Steve
>
> >> Thinking Sphinx uses left outer joins (because that's what ActiveRecord 
> >> uses, and it's the best fit in this situation), and yes, one doc is one 
> >> row.
>
> >> There's nothing odd in your index definition - so let's have a look at the 
> >> sql_query value of the source business_core_0, in 
> >> config/development.sphinx.conf (or production - they should be the same). 
> >> You should be able to run it manually, see if it returns the right results 
> >> (perhaps over a particular step, to keep things reasonably fast and 
> >> manageable).
>
> >> --
> >> Pat
>
> >> On 15/12/2009, at 3:26 AM, Steve H wrote:
>
> >>> Thanks for the lightning-fast reply Pat.  :)
>
> >>>  define_index do
> >>>    indexes :name, :sortable => true
> >>>    indexes categories_and_tags(:name), :as
> >>> => :category_names, :sortable => true
> >>>    indexes [city.city, city.state_code], :as => :city, :sortable =>
> >>> true
> >>>    set_property :group_concat_max_len => 8192
> >>>    # attributes
> >>>    has lat, lon
> >>>  end
>
> >>> Every business has at least one category or tag, but I'm kind of
> >>> assuming here that the indexer is doing a full left join (ie, not an
> >>> inner join).
>
> >>> Could something in this block be causing the problem?  And is 1 doc =
> >>> 1 row in the db?
>
> >>> -Steve
>
> >>> On Dec 13, 7:03 pm, Pat Allan <[email protected]> wrote:
> >>>> Hi Steve
>
> >>>> What does your define_index block for Business look like?
>
> >>>> --
> >>>> Pat
>
> >>>> On 14/12/2009, at 1:44 PM, Steve H wrote:
>
> >>>>> OK first off -- I'm not 100% certain that 1 doc = 1 record.  But if it
> >>>>> DOES then, sphinx definitely didn't collect enough.  We have over 13
> >>>>> million rows in our database, but this was the output after indexing:
>
> >>>>> -----------------------------------------------
> >>>>> indexing index 'business_core'...
> >>>>> collected 1199 docs, 0.7 MB
> >>>>> collected 0 attr values
> >>>>> sorted 0.0 Mvalues, 100.0% done
> >>>>> sorted 0.1 Mhits, 100.0% done
> >>>>> total 1199 docs, 685923 bytes
> >>>>> total 1381.345 sec, 496.56 bytes/sec, 0.87 docs/sec
> >>>>> distributed index 'business' can not be directly indexed; skipping.
> >>>>> Started successfully (pid 32011).
> >>>>> -----------------------------------------------
>
> >>>>> After reading some of the docs, I thought... maybe I have to increase
> >>>>> the sql_range_step in the sphinx.yml or increase the
> >>>>> group_concat_max_len property.  So I did those things, but the output
> >>>>> after indexing is the same (albeit much much faster  :)
>
> >>>>> -----------------------------------------------
> >>>>> Stopped search daemon (pid 2810).
> >>>>> Generating Configuration to /home/steve/insiderpages/config/
> >>>>> sphinx_dev.sphinx.conf
> >>>>> Sphinx 0.9.8.1-release (r1533)
> >>>>> Copyright (c) 2001-2008, Andrew Aksyonoff
>
> >>>>> using config file '/home/steve/insiderpages/config/
> >>>>> sphinx_dev.sphinx.conf'...
> >>>>> indexing index 'business_core'...
> >>>>> collected 1199 docs, 0.1 MB
> >>>>> collected 0 attr values
> >>>>> sorted 0.0 Mvalues, 100.0% done
> >>>>> sorted 0.0 Mhits, 100.0% done
> >>>>> total 1199 docs, 79605 bytes
> >>>>> total 0.447 sec, 177901.38 bytes/sec, 2679.53 docs/sec
> >>>>> distributed index 'business' can not be directly indexed; skipping.
> >>>>> Started successfully (pid 6365).
> >>>>> -----------------------------------------------
>
> >>>>> What's the problem?
>
> >>>>> -Steve
>
> >>>>> PS:
> >>>>> mysql> select count(id) from businesses;
> >>>>> +-----------+
> >>>>> | count(id) |
> >>>>> +-----------+
> >>>>> |  13395683 |
> >>>>> +-----------+
> >>>>> 1 row in set (0.00 sec)
>
> >>>>> mysql> select max(id) from businesses;
> >>>>> +-------------+
> >>>>> | max(id)     |
> >>>>> +-------------+
> >>>>> | 15250646126 |
> >>>>> +-------------+
> >>>>> 1 row in set (0.00 sec)
>
> >>>>> --
>
> >>>>> You received this message because you are subscribed to the Google 
> >>>>> Groups "Thinking Sphinx" group.
> >>>>> To post to this group, send email to [email protected].
> >>>>> To unsubscribe from this group, send email to 
> >>>>> [email protected].
> >>>>> For more options, visit this group 
> >>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>
> >>> --
>
> >>> You received this message because you are subscribed to the Google Groups 
> >>> "Thinking Sphinx" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to 
> >>> [email protected].
> >>> For more options, visit this group 
> >>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>
> > --
>
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/thinking-sphinx?hl=en.

--

You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.


Reply via email to