Re: [ts] Using Thinking Sphinx in conjunction with Amazon S3?

Pat Allan Wed, 10 Mar 2010 16:02:58 -0800

Hi Graham

Sphinx only talks to (some) SQL databases, or XML. It sounds like the latter is 
the better option (write a script to pull the data out of S3 into an XML stream 
every time you index), but the catch there is that Thinking Sphinx doesn't have 
any hooks for XML options at this point.


Also, I'm not sure if you can mix SQL and XML sources into the same index, but 
there will be ways to work around that (at a pinch, the XML script would just 
talk to the database as well as S3).

This also means that indexing is going to become much slower, because of all 
the HTTP request involved (unless there's some way of getting to the data in 
bulk?)


Every now and then someone raises an issue like yours, but it's not been often 
enough to make it a high priority... I appreciate your offer of funding this 
kind of extension (well, if you're still interested, now you have an idea of 
what's involved), but I've got no spare time for the next two months - even 
Rails 3 support is a while away.

So, if you're cool with waiting, then maybe this is an option - otherwise, I'm 
not sure what the best way forward is for you - if this is a key feature, then 
maybe Sphinx/TS isn't the best search solution for you currently.

Cheers

-- 
Pat
e: [email protected]      || m: +614 1327 3337
w: http://freelancing-gods.com   || t: twitter.com/pat
bounce: http://trampolineday.com || skype: patallan

On 10/03/2010, at 5:30 AM, Graham Glass wrote:

> Hi everyone,
> 
> I'm the founder of EDU 2.0 (http://www.edu20.org) and due to the rapid
> growth of the site, much of the text that used to be stored in MySQL
> is now being stored in Amazon S3.
> 
> For example, if a lesson is created that has 100K of HTML in it, we
> store the text itself in S3 and just hold its name + URL in MySQL.
> This approach has allowed us to shrink our MySQL memory requirements
> dramatically, which is important for the long-term.
> 
> We already use Sphinx to index things like messages, forum postings,
> etc. but not lessons. Now we'd like to start indexing the lessons as
> well. So my question is - what is the best way to use Thinking Sphinx
> if text is stored in S3? Is there a way to use Sphinx to update its
> indexes as files are uploaded to S3? That way, when I search for a
> term, I could get back a list of URLs where the terms were found.
> 
> One last thing; we keep a MySQL table that keeps track of every single
> S3 file + its URL, so its easy to correlate the S3 files with their
> URLs.
> 
> I would be willing to fund extensions to Thinking Sphinx to make this
> possible!
> 
> Cheers,
> Graham
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Re: [ts] Using Thinking Sphinx in conjunction with Amazon S3?

Reply via email to