We've got a lot of data! 2 hours indexes two tables, one of which has about 2 million rows and the other has about 1 million rows. The table with two million rows also has to index about 20 different attributes, many of which are accessed through multi-model associations.
I remember the days of indexing being possible within a few seconds. If your indexing is that fast, it may be a workable hackaround to just re-index every 5-10 minutes using cron...? Bill On May 3, 4:03 pm, agib <[email protected]> wrote: > Hmm... interesting, and thank you for the feedback! Still seems like > there isn't an ideal set up for this. May I ask how many rows and what > kind of rows lead to 2hr indexing? Right now I have 20,000 rows being > indexed and it only takes a few seconds to run. > > I really don't know too much about sphinx itself, but I wonder if > there's a way to use it's built in distributed index like this: > > server A1: app + sphinx (delta index only) > server A2: app + sphinx (delta index only) > server An: ... > server B: db + sphinx (cron full indexing + clear all app server delta > indexes) > > Maybe I'll post a question on the sphinx forums too. > > -ajg- > > On May 3, 6:56 pm, wbharding <[email protected]> wrote: > > > We build our indexes on a remote machine (that uses a slave version of > > our DB), then sftp the resulting index files to our web servers, each > > of which run their own TS instance that uses cron to send a SIGHUP > > that refreshes the search, similar to what it sounds like Josh is > > describing. > > > Two weeks ago, I spent a couple days trying to update this > > configuration so we could use time-based delta indexing on that remote > > machine to rebuilding our indexes more frequently. However, we ran > > into a number of instances where this broke search in a variety of > > interesting ways... everything from only parts of the search string > > being used, to partial results being returned (ie., only items older > > than 3 months). > > > Ultimately, we reverted back to just doing full indexes and sftping > > them (as described in first paragraph). I'm not entirely sure which > > aspect of the delta process is to blame for our troubles (was it the > > Sphinx merging? The Thinking Sphinx time-stamp delta indexing? Or > > just our own code?), but we went through a lot of pain when we tried > > to combine delta indexing with across multiple servers. > > > Seeing as how our indexing now takes almost two hours (and ideally our > > main site search would be updated once/hour or more), we'll surely > > have to revisit this before too much longer. I'll post the results if/ > > when I manage to crack this nut. > > > Bill > > > On May 2, 4:16 am, Josh <[email protected]> wrote: > > > > Sorry, I neglected half of your question. In our case, we run both a > > > daily full-index and a more frequent delta index on one machine. > > > Regardless of which type of index we are running, we rename the > > > resulting files and push them to each server that runs searchd, and > > > send the SIGHUP signal to get the indexes refreshed. > > > > The downside of this is that we can't use thinking_sphinx's spiffy > > > indexing tasks, but it does work well. Again, I'm not sure how easy > > > this is under EC2, I don't have any experience there. > > > > - Josh > > > > On May 1, 10:26 am, agib <[email protected]> wrote: > > > > > Hi Josh, thank you for the response, but I still don't see how that > > > > fixes the deltas issue... > > > > > Is anyone using sphinx's built-in distributed searching feature? > > > > Wouldn't that be the best solution to this problem? > > > > > On May 1, 7:59 am, Josh <[email protected]> wrote: > > > > > > There are a few ways to do this, though I'm not sure what will work on > > > > > EC2, check out this thread: > > > > > >http://groups.google.com/group/thinking-sphinx/browse_thread/thread/b... > > > > > > -Josh > > > > > > On May 1, 2:51 am, agib <[email protected]> wrote: > > > > > > > I'm not sure I understand how to get the deltas working on a 2+ > > > > > > server > > > > > > environment... let's say I have server A (app + sphinx) and server B > > > > > > (app). If a request to server B updates a model that has :delta => > > > > > > true, how does the sphinx index on server A get updated? Do I have > > > > > > to > > > > > > set up some sort of shared filesystem? I'm on EC2 and I'm not sure > > > > > > that's possible... I used to have A (app + sphinx) and B (app + > > > > > > sphinx) but then I realized that it was possible for both servers to > > > > > > return different results (i.e. I could refresh a search result page > > > > > > and get alternating results). Is there any good solution for remote > > > > > > delta indexes? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
