ok I have a good discussion going on the sphinx forums: http://www.sphinxsearch.com/forum/view.html?id=3475
On May 3, 7:07 pm, wbharding <[email protected]> wrote: > We've got a lot of data! > > 2 hours indexes two tables, one of which has about 2 million rows and > the other has about 1 million rows. The table with two million rows > also has to index about 20 different attributes, many of which are > accessed through multi-model associations. > > I remember the days of indexing being possible within a few seconds. > If your indexing is that fast, it may be a workable hackaround to just > re-index every 5-10 minutes using cron...? > > Bill > > On May 3, 4:03 pm, agib <[email protected]> wrote: > > > Hmm... interesting, and thank you for the feedback! Still seems like > > there isn't an ideal set up for this. May I ask how many rows and what > > kind of rows lead to 2hr indexing? Right now I have 20,000 rows being > > indexed and it only takes a few seconds to run. > > > I really don't know too much about sphinx itself, but I wonder if > > there's a way to use it's built in distributed index like this: > > > server A1: app + sphinx (delta index only) > > server A2: app + sphinx (delta index only) > > server An: ... > > server B: db + sphinx (cron full indexing + clear all app server delta > > indexes) > > > Maybe I'll post a question on the sphinx forums too. > > > -ajg- > > > On May 3, 6:56 pm, wbharding <[email protected]> wrote: > > > > We build our indexes on a remote machine (that uses a slave version of > > > our DB), then sftp the resulting index files to our web servers, each > > > of which run their own TS instance that uses cron to send a SIGHUP > > > that refreshes the search, similar to what it sounds like Josh is > > > describing. > > > > Two weeks ago, I spent a couple days trying to update this > > > configuration so we could use time-based delta indexing on that remote > > > machine to rebuilding our indexes more frequently. However, we ran > > > into a number of instances where this broke search in a variety of > > > interesting ways... everything from only parts of the search string > > > being used, to partial results being returned (ie., only items older > > > than 3 months). > > > > Ultimately, we reverted back to just doing full indexes and sftping > > > them (as described in first paragraph). I'm not entirely sure which > > > aspect of the delta process is to blame for our troubles (was it the > > > Sphinx merging? The Thinking Sphinx time-stamp delta indexing? Or > > > just our own code?), but we went through a lot of pain when we tried > > > to combine delta indexing with across multiple servers. > > > > Seeing as how our indexing now takes almost two hours (and ideally our > > > main site search would be updated once/hour or more), we'll surely > > > have to revisit this before too much longer. I'll post the results if/ > > > when I manage to crack this nut. > > > > Bill > > > > On May 2, 4:16 am, Josh <[email protected]> wrote: > > > > > Sorry, I neglected half of your question. In our case, we run both a > > > > daily full-index and a more frequent delta index on one machine. > > > > Regardless of which type of index we are running, we rename the > > > > resulting files and push them to each server that runs searchd, and > > > > send the SIGHUP signal to get the indexes refreshed. > > > > > The downside of this is that we can't use thinking_sphinx's spiffy > > > > indexing tasks, but it does work well. Again, I'm not sure how easy > > > > this is under EC2, I don't have any experience there. > > > > > - Josh > > > > > On May 1, 10:26 am, agib <[email protected]> wrote: > > > > > > Hi Josh, thank you for the response, but I still don't see how that > > > > > fixes the deltas issue... > > > > > > Is anyone using sphinx's built-in distributed searching feature? > > > > > Wouldn't that be the best solution to this problem? > > > > > > On May 1, 7:59 am, Josh <[email protected]> wrote: > > > > > > > There are a few ways to do this, though I'm not sure what will work > > > > > > on > > > > > > EC2, check out this thread: > > > > > > >http://groups.google.com/group/thinking-sphinx/browse_thread/thread/b... > > > > > > > -Josh > > > > > > > On May 1, 2:51 am, agib <[email protected]> wrote: > > > > > > > > I'm not sure I understand how to get the deltas working on a 2+ > > > > > > > server > > > > > > > environment... let's say I have server A (app + sphinx) and > > > > > > > server B > > > > > > > (app). If a request to server B updates a model that has :delta => > > > > > > > true, how does the sphinx index on server A get updated? Do I > > > > > > > have to > > > > > > > set up some sort of shared filesystem? I'm on EC2 and I'm not sure > > > > > > > that's possible... I used to have A (app + sphinx) and B (app + > > > > > > > sphinx) but then I realized that it was possible for both servers > > > > > > > to > > > > > > > return different results (i.e. I could refresh a search result > > > > > > > page > > > > > > > and get alternating results). Is there any good solution for > > > > > > > remote > > > > > > > delta indexes? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
