We've got a lot of data!

2 hours indexes two tables, one of which has about 2 million rows and
the other has about 1 million rows.  The table with two million rows
also has to index about 20 different attributes, many of which are
accessed through multi-model associations.

I remember the days of indexing being possible within a  few seconds.
If your indexing is that fast, it may be a workable hackaround to just
re-index every 5-10 minutes using cron...?

Bill

On May 3, 4:03 pm, agib <[email protected]> wrote:
> Hmm... interesting, and thank you for the feedback! Still seems like
> there isn't an ideal set up for this. May I ask how many rows and what
> kind of rows lead to 2hr indexing? Right now I have 20,000 rows being
> indexed and it only takes a few seconds to run.
>
> I really don't know too much about sphinx itself, but I wonder if
> there's a way to use it's built in distributed index like this:
>
> server A1: app + sphinx (delta index only)
> server A2: app + sphinx (delta index only)
> server An: ...
> server B: db + sphinx (cron full indexing + clear all app server delta
> indexes)
>
> Maybe I'll post a question on the sphinx forums too.
>
> -ajg-
>
> On May 3, 6:56 pm, wbharding <[email protected]> wrote:
>
> > We build our indexes on a remote machine (that uses a slave version of
> > our DB), then sftp the resulting index files to our web servers, each
> > of which run their own TS instance that uses cron to send a SIGHUP
> > that refreshes the search, similar to what it sounds like Josh is
> > describing.
>
> > Two weeks ago, I spent a couple days trying to update this
> > configuration so we could use time-based delta indexing on that remote
> > machine to rebuilding our indexes more frequently.  However, we ran
> > into a number of instances where this broke search in a variety of
> > interesting ways... everything from only parts of the search string
> > being used, to partial results being returned (ie., only items older
> > than 3 months).
>
> > Ultimately, we reverted back to just doing full indexes and sftping
> > them (as described in first paragraph).  I'm not entirely sure which
> > aspect of the delta process is to blame for our troubles (was it the
> > Sphinx merging?  The Thinking Sphinx time-stamp delta indexing?  Or
> > just our own code?), but we went through a lot of pain when we tried
> > to combine delta indexing with across multiple servers.
>
> > Seeing as how our indexing now takes almost two hours (and ideally our
> > main site search would be updated once/hour or more), we'll surely
> > have to revisit this before too much longer.  I'll post the results if/
> > when I manage to crack this nut.
>
> > Bill
>
> > On May 2, 4:16 am, Josh <[email protected]> wrote:
>
> > > Sorry, I neglected half of your question.  In our case, we run both a
> > > daily full-index and a more frequent delta index on one machine.
> > > Regardless of which type of index we are running, we rename the
> > > resulting files and push them to each server that runs searchd, and
> > > send the SIGHUP signal to get the indexes refreshed.
>
> > > The downside of this is that we can't use thinking_sphinx's spiffy
> > > indexing tasks, but it does work well.  Again, I'm not sure how easy
> > > this is under EC2, I don't have any experience there.
>
> > > - Josh
>
> > > On May 1, 10:26 am, agib <[email protected]> wrote:
>
> > > > Hi Josh, thank you for the response, but I still don't see how that
> > > > fixes the deltas issue...
>
> > > > Is anyone using sphinx's built-in distributed searching feature?
> > > > Wouldn't that be the best solution to this problem?
>
> > > > On May 1, 7:59 am, Josh <[email protected]> wrote:
>
> > > > > There are a few ways to do this, though I'm not sure what will work on
> > > > > EC2, check out this thread:
>
> > > > >http://groups.google.com/group/thinking-sphinx/browse_thread/thread/b...
>
> > > > > -Josh
>
> > > > > On May 1, 2:51 am, agib <[email protected]> wrote:
>
> > > > > > I'm not sure I understand how to get the deltas working on a 2+ 
> > > > > > server
> > > > > > environment... let's say I have server A (app + sphinx) and server B
> > > > > > (app). If a request to server B updates a model that has :delta =>
> > > > > > true, how does the sphinx index on server A get updated? Do I have 
> > > > > > to
> > > > > > set up some sort of shared filesystem? I'm on EC2 and I'm not sure
> > > > > > that's possible... I used to have A (app + sphinx) and B (app +
> > > > > > sphinx) but then I realized that it was possible for both servers to
> > > > > > return different results (i.e. I could refresh a search result page
> > > > > > and get alternating results). Is there any good solution for remote
> > > > > > delta indexes?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to