ok I have a good discussion going on the sphinx forums:
http://www.sphinxsearch.com/forum/view.html?id=3475

On May 3, 7:07 pm, wbharding <[email protected]> wrote:
> We've got a lot of data!
>
> 2 hours indexes two tables, one of which has about 2 million rows and
> the other has about 1 million rows.  The table with two million rows
> also has to index about 20 different attributes, many of which are
> accessed through multi-model associations.
>
> I remember the days of indexing being possible within a  few seconds.
> If your indexing is that fast, it may be a workable hackaround to just
> re-index every 5-10 minutes using cron...?
>
> Bill
>
> On May 3, 4:03 pm, agib <[email protected]> wrote:
>
> > Hmm... interesting, and thank you for the feedback! Still seems like
> > there isn't an ideal set up for this. May I ask how many rows and what
> > kind of rows lead to 2hr indexing? Right now I have 20,000 rows being
> > indexed and it only takes a few seconds to run.
>
> > I really don't know too much about sphinx itself, but I wonder if
> > there's a way to use it's built in distributed index like this:
>
> > server A1: app + sphinx (delta index only)
> > server A2: app + sphinx (delta index only)
> > server An: ...
> > server B: db + sphinx (cron full indexing + clear all app server delta
> > indexes)
>
> > Maybe I'll post a question on the sphinx forums too.
>
> > -ajg-
>
> > On May 3, 6:56 pm, wbharding <[email protected]> wrote:
>
> > > We build our indexes on a remote machine (that uses a slave version of
> > > our DB), then sftp the resulting index files to our web servers, each
> > > of which run their own TS instance that uses cron to send a SIGHUP
> > > that refreshes the search, similar to what it sounds like Josh is
> > > describing.
>
> > > Two weeks ago, I spent a couple days trying to update this
> > > configuration so we could use time-based delta indexing on that remote
> > > machine to rebuilding our indexes more frequently.  However, we ran
> > > into a number of instances where this broke search in a variety of
> > > interesting ways... everything from only parts of the search string
> > > being used, to partial results being returned (ie., only items older
> > > than 3 months).
>
> > > Ultimately, we reverted back to just doing full indexes and sftping
> > > them (as described in first paragraph).  I'm not entirely sure which
> > > aspect of the delta process is to blame for our troubles (was it the
> > > Sphinx merging?  The Thinking Sphinx time-stamp delta indexing?  Or
> > > just our own code?), but we went through a lot of pain when we tried
> > > to combine delta indexing with across multiple servers.
>
> > > Seeing as how our indexing now takes almost two hours (and ideally our
> > > main site search would be updated once/hour or more), we'll surely
> > > have to revisit this before too much longer.  I'll post the results if/
> > > when I manage to crack this nut.
>
> > > Bill
>
> > > On May 2, 4:16 am, Josh <[email protected]> wrote:
>
> > > > Sorry, I neglected half of your question.  In our case, we run both a
> > > > daily full-index and a more frequent delta index on one machine.
> > > > Regardless of which type of index we are running, we rename the
> > > > resulting files and push them to each server that runs searchd, and
> > > > send the SIGHUP signal to get the indexes refreshed.
>
> > > > The downside of this is that we can't use thinking_sphinx's spiffy
> > > > indexing tasks, but it does work well.  Again, I'm not sure how easy
> > > > this is under EC2, I don't have any experience there.
>
> > > > - Josh
>
> > > > On May 1, 10:26 am, agib <[email protected]> wrote:
>
> > > > > Hi Josh, thank you for the response, but I still don't see how that
> > > > > fixes the deltas issue...
>
> > > > > Is anyone using sphinx's built-in distributed searching feature?
> > > > > Wouldn't that be the best solution to this problem?
>
> > > > > On May 1, 7:59 am, Josh <[email protected]> wrote:
>
> > > > > > There are a few ways to do this, though I'm not sure what will work 
> > > > > > on
> > > > > > EC2, check out this thread:
>
> > > > > >http://groups.google.com/group/thinking-sphinx/browse_thread/thread/b...
>
> > > > > > -Josh
>
> > > > > > On May 1, 2:51 am, agib <[email protected]> wrote:
>
> > > > > > > I'm not sure I understand how to get the deltas working on a 2+ 
> > > > > > > server
> > > > > > > environment... let's say I have server A (app + sphinx) and 
> > > > > > > server B
> > > > > > > (app). If a request to server B updates a model that has :delta =>
> > > > > > > true, how does the sphinx index on server A get updated? Do I 
> > > > > > > have to
> > > > > > > set up some sort of shared filesystem? I'm on EC2 and I'm not sure
> > > > > > > that's possible... I used to have A (app + sphinx) and B (app +
> > > > > > > sphinx) but then I realized that it was possible for both servers 
> > > > > > > to
> > > > > > > return different results (i.e. I could refresh a search result 
> > > > > > > page
> > > > > > > and get alternating results). Is there any good solution for 
> > > > > > > remote
> > > > > > > delta indexes?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to