A brief update - the index on the delta column did the trick (surprise, I know ;)).
Just for fun: collected 1702286 docs, 119.8 MB total 491.290 sec, 243840.36 bytes/sec, 3464.93 docs/sec dpc On Mar 24, 9:19 pm, Pat Allan <[email protected]> wrote: > Yup, putting an index on the column is a good idea - and if you want > to put through a patch that only changes delta to 0 WHERE delta = 1, > then default_delta is the right place to make that change. Not sure if > it'd make it faster, but you've got a decent sized dataset to check :) > > Cheers > > -- > Pat > > On 25/03/2009, at 3:15 PM, Damon P. Cortesi wrote: > > > > > That'd certainly be a good start, wouldn't it hehe. > > > I don't seem to have an index on there, I'll update that this weekend > > and see if it helps. > > > On Mar 24, 5:08 am, James Healy <[email protected]> wrote: > >> I'm indexing a table with 500K records and delta indexing, and > >> building > >> the core index only takes a couple of minutes. It's hard to imagine > >> triple the number of records should blow out the indexing time by a > >> factor of more than 20. > > >> In my case, each time the core index is rebuilt there's probably only > >> 3000-4000 records with delta set to 1 - I'm not sure if that makes a > >> difference to indexing time. > > >> Is your delta column indexed? At first thought I wouldn't think that > >> would matter for an update query, but maybe it does? > > >> Indexing your delta column is good practice anyway, at the very > >> least it > >> speeds up the building of the delta index. > > >> -- James Healy <jimmy-at-deefa-dot-com> Tue, 24 Mar 2009 23:02:42 > >> +1100 > > >> Damon P. Cortesi wrote: > > >>> I've got a table with about 1.5m entries that I'm indexing using > >>> ThinkingSphinx (Twitter data, bios specifically - tweepsearch.com). > > >>> I have delta indexing enabled, which works fantastic. But as the > >>> size > >>> of the table has grown, as has the indexing time for the core index. > >>> As an example, I have a `rake ts:index` task running right now > >>> that's > >>> been going for 60 minutes. Not on indexing, though, on a db query > >>> - a > >>> "show processlist" in MySQL shows the following query: > >>> UPDATE `users` SET `delta` = 0 > > >>> So I'm assuming this task is attempting to set the `delta` column of > >>> every row in my table, which is leading to this delay. It seems like > >>> this query is originating out of the reset_query method in: > >>> lib/thinking_sphinx/deltas/default_delta.rb > > >>> I considered adding a WHERE clause to this to see if that might > >>> help, > >>> but wasn't quite sure this was the right place, or even if that > >>> would > >>> be appropriate. > > >>> Any insight would be appreciated, > > >>> dpc > > >>> -- > >>> Damon P. Cortesi > >>> Security Guy, Twitter Apps > >>> www. tweetstats | tweepsearch | tweetsum .com > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
