Consider it done. Ticket created. :) Cheers!
-Steve On Jan 5, 12:36 am, Pat Allan <[email protected]> wrote: > Hi Steve > > I've been thinking about this a bit today... it might be possible to default > to bigint, although that has performance issues I'm guessing (granted, I > doubt they're big ones). Maybe indexes need some flag to default all integers > to big ints... > > Consider it on my list of things to do. If you'd like to create a ticket for > this on GitHub, that'd be fantastic. > > Cheers > > -- > Pat > > On 05/01/2010, at 12:00 PM, Steve H wrote: > > > > > Pat -- > > > For you and anyone else who may run into this issue: > > We found that while the indexer successfully indexed all our models, > > the sphinx_internal_id was wrong for all models with large ids > > (anything 11 digits or more). This, of course, caused results brought > > back to be not found in the database. We were wondering why the > > "document" number returned by the search binary couldn't be used > > instead of sphinx_internal_id... but that aside, we found that > > sphinx_attr_bigint was introduced with 0.9.9, so we will be (upgrading > > and) modifying our generated config file appropriately. > > > Is there a way to specify that the sphinx_internal_id should be a > > sphinx_attr_bigint in the sphinx.yml file? > > > Thanks, > > Steve > > > On Dec 14 2009, 5:30 pm, Steve H <[email protected]> wrote: > >> Thanks Pat!! That certainly did the trick. ~13.5 million businesses > >> with their associated city and categories indexed in about 15 > >> minutes. Of course the indexing was being done on the same box that > >> the mysqld was being run on, but still -- pretty dang fast. > > >> Thanks again! Now for the fun stuff: actual implementation, > >> deployment concerns, etc. > > >> -Steve > > >> On Dec 14, 4:53 pm, Pat Allan <[email protected]> wrote: > > >>> Yeah, that's my thought too - was wondering if it was MySQL at fault, or > >>> Sphinx. > > >>> Sphinx 0.9.9 (and maybe 0.9.8) has a --enable-id64 flag when you > >>> configure - I think it's worth recompiling and trying indexing again. > > >>> -- > >>> Pat > > >>> On 15/12/2009, at 11:40 AM, Steve H wrote: > > >>>> I found one thing very strange... no matter what the sql_range_step is > >>>> set to (default - 10,000,000), the last query always has an upper > >>>> limit of 2365744349. My very astute manager noticed that this is kind > >>>> of in the neighborhood of the largest possible value for an int. > >>>> Maybe that has something to do with it? > > >>>> -Steve > > >>>> On Dec 14, 4:35 pm, Steve H <[email protected]> wrote: > >>>>> mysql> SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) > >>>>> -> FROM `businesses`; > >>>>> +----------------------+----------------------+ > >>>>> | IFNULL(MIN(`id`), 1) | IFNULL(MAX(`id`), 1) | > >>>>> +----------------------+----------------------+ > >>>>> | 1 | 15250646237 | > >>>>> +----------------------+----------------------+ > > >>>>> On Dec 14, 4:32 pm, Pat Allan <[email protected]> wrote: > > >>>>>> Hmm, if you run the sql_query_range query yourself, what does it > >>>>>> return? > > >>>>>> -- > >>>>>> Pat > > >>>>>> On 15/12/2009, at 11:24 AM, Steve H wrote: > > >>>>>>> Hi Pat, > > >>>>>>> Here is the sql_query: > > >>>>>>> sql_query = SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , > >>>>>>> CAST(`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP_CONCAT > >>>>>>> (DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS CHAR) > >>>>>>> AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`, > >>>>>>> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS > >>>>>>> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS > >>>>>>> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`, > >>>>>>> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL > >>>>>>> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS > >>>>>>> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), > >>>>>>> IFNULL > >>>>>>> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS > >>>>>>> `lat`, `businesses`.`lon` AS `lon` FROM `businesses` LEFT OUTER > >>>>>>> JOIN `categorizations` ON (`businesses`.`id` = > >>>>>>> `categorizations`.`business_id`) LEFT OUTER JOIN `cs_categories` ON > >>>>>>> (`cs_categories`.`id` = `categorizations`.`cs_category_id`) LEFT > >>>>>>> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id WHERE > >>>>>>> `businesses`.`id` >= $start AND `businesses`.`id` <= $end GROUP BY > >>>>>>> `businesses`.`id` ORDER BY NULL > >>>>>>> sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) > >>>>>>> FROM `businesses` > > >>>>>>> We turned on logging for every query in mysqld to troubleshoot this. > >>>>>>> This is the last query run: > > >>>>>>> SELECT SQL_NO_CACHE `businesses`.`id` * 1 + 0 AS `id` , CAST > >>>>>>> (`businesses`.`name` AS CHAR) AS `name`, CAST(GROUP > >>>>>>> _CONCAT(DISTINCT IFNULL(`cs_categories`.`name`, '0') SEPARATOR ' ') AS > >>>>>>> CHAR) AS `category_names`, CAST(CONCAT_WS(' ', `city`.`city`, > >>>>>>> `city`.`state_code`) AS CHAR) AS `city`, `businesses`.`id` AS > >>>>>>> `sphinx_internal_id`, 4054125678 AS `class_crc`, '4054125678' AS > >>>>>>> `subclass_crcs`, 0 AS `sphinx_deleted`, IFNULL(`businesses`.`name`, > >>>>>>> '') AS `name_sort`, GROUP_CONCAT(DISTINCT IFNULL(IFNULL > >>>>>>> (`cs_categories`.`name`, ''), '0') SEPARATOR ' ') AS > >>>>>>> `category_names_sort`, CONCAT_WS(' ', IFNULL(`city`.`city`, ''), > >>>>>>> IFNULL > >>>>>>> (`city`.`state_code`, '')) AS `city_sort`, `businesses`.`lat` AS > >>>>>>> `lat`, `businesses`.`lon` AS `lon` FROM `businesses` LEFT OUTER > >>>>>>> JOIN `categorizations` ON (`businesses`.`id` = > >>>>>>> `categorizations`.`business_id`) LEFT OUTER JOIN `cs_categories` ON > >>>>>>> (`cs_categories`.`id` = `categorizations`.`cs_category_id`) LEFT > >>>>>>> OUTER JOIN `city` ON `city`.city_id = `businesses`.city_id WHERE > >>>>>>> `businesses`.`id` >= 2365000001 AND `businesses`.`id` <= 2365744349 > >>>>>>> GROUP BY `businesses`.`id` ORDER BY NULL > > >>>>>>> Which returns zero rows. In fact, the last 15+ queries return zero > >>>>>>> rows. The businesses table has no records with ids > >>>>>>> 2034010806...2500480805. And of course, the maximum id is > >>>>>>> 15250646237. We need *all* the rows indexed, of course... :) > > >>>>>>> So should we be increasing the step_size to like 10 million? > > >>>>>>> -Steve > > >>>>>>> On Dec 14, 2:58 pm, Pat Allan <[email protected]> wrote: > >>>>>>>> Hi Steve > > >>>>>>>> Thinking Sphinx uses left outer joins (because that's what > >>>>>>>> ActiveRecord uses, and it's the best fit in this situation), and > >>>>>>>> yes, one doc is one row. > > >>>>>>>> There's nothing odd in your index definition - so let's have a look > >>>>>>>> at the sql_query value of the source business_core_0, in > >>>>>>>> config/development.sphinx.conf (or production - they should be the > >>>>>>>> same). You should be able to run it manually, see if it returns the > >>>>>>>> right results (perhaps over a particular step, to keep things > >>>>>>>> reasonably fast and manageable). > > >>>>>>>> -- > >>>>>>>> Pat > > >>>>>>>> On 15/12/2009, at 3:26 AM, Steve H wrote: > > >>>>>>>>> Thanks for the lightning-fast reply Pat. :) > > >>>>>>>>> define_index do > >>>>>>>>> indexes :name, :sortable => true > >>>>>>>>> indexes categories_and_tags(:name), :as > >>>>>>>>> => :category_names, :sortable => true > >>>>>>>>> indexes [city.city, city.state_code], :as => :city, :sortable => > >>>>>>>>> true > >>>>>>>>> set_property :group_concat_max_len => 8192 > >>>>>>>>> # attributes > >>>>>>>>> has lat, lon > >>>>>>>>> end > > >>>>>>>>> Every business has at least one category or tag, but I'm kind of > >>>>>>>>> assuming here that the indexer is doing a full left join (ie, not an > >>>>>>>>> inner join). > > >>>>>>>>> Could something in this block be causing the problem? And is 1 doc > >>>>>>>>> = > >>>>>>>>> 1 row in the db? > > >>>>>>>>> -Steve > > >>>>>>>>> On Dec 13, 7:03 pm, Pat Allan <[email protected]> wrote: > >>>>>>>>>> Hi Steve > > >>>>>>>>>> What does your define_index block for Business look like? > > >>>>>>>>>> -- > >>>>>>>>>> Pat > > >>>>>>>>>> On 14/12/2009, at 1:44 PM, Steve H wrote: > > >>>>>>>>>>> OK first off -- I'm not 100% certain that 1 doc = 1 record. But > >>>>>>>>>>> if it > >>>>>>>>>>> DOES then, sphinx definitely didn't collect enough. We have over > >>>>>>>>>>> 13 > >>>>>>>>>>> million rows in our database, but this was the output after > >>>>>>>>>>> indexing: > > >>>>>>>>>>> ----------------------------------------------- > >>>>>>>>>>> indexing index 'business_core'... > >>>>>>>>>>> collected 1199 docs, 0.7 MB > >>>>>>>>>>> collected 0 attr values > >>>>>>>>>>> sorted 0.0 Mvalues, 100.0% done > >>>>>>>>>>> sorted 0.1 Mhits, 100.0% done > >>>>>>>>>>> total 1199 docs, 685923 bytes > >>>>>>>>>>> total 1381.345 sec, 496.56 bytes/sec, 0.87 docs/sec > >>>>>>>>>>> distributed index 'business' can not be directly indexed; > >>>>>>>>>>> skipping. > >>>>>>>>>>> Started successfully (pid 32011). > >>>>>>>>>>> ----------------------------------------------- > > >>>>>>>>>>> After reading some of the docs, I thought... maybe I have to > >>>>>>>>>>> increase > >>>>>>>>>>> the sql_range_step in the sphinx.yml or increase the > >>>>>>>>>>> group_concat_max_len property. So I did those things, but the > >>>>>>>>>>> output > >>>>>>>>>>> after indexing is the same (albeit much much faster :) > > >>>>>>>>>>> ----------------------------------------------- > >>>>>>>>>>> Stopped search daemon (pid 2810). > >>>>>>>>>>> Generating Configuration to /home/steve/insiderpages/config/ > >>>>>>>>>>> sphinx_dev.sphinx.conf > >>>>>>>>>>> Sphinx 0.9.8.1-release (r1533) > >>>>>>>>>>> Copyright (c) 2001-2008, Andrew Aksyonoff > > >>>>>>>>>>> using config file '/home/steve/insiderpages/config/ > >>>>>>>>>>> sphinx_dev.sphinx.conf'... > >>>>>>>>>>> indexing index 'business_core'... > >>>>>>>>>>> collected 1199 docs, 0.1 MB > >>>>>>>>>>> collected 0 attr values > >>>>>>>>>>> sorted 0.0 Mvalues, 100.0% done > >>>>>>>>>>> sorted 0.0 Mhits, 100.0% done > >>>>>>>>>>> total 1199 docs, 79605 bytes > >>>>>>>>>>> total 0.447 sec, 177901.38 bytes/sec, 2679.53 docs/sec > >>>>>>>>>>> distributed index 'business' can not be directly indexed; > >>>>>>>>>>> skipping. > >>>>>>>>>>> Started successfully (pid 6365). > >>>>>>>>>>> ----------------------------------------------- > > >>>>>>>>>>> What's the problem? > > ... > > read more »
-- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
